Building a Vocal Range Detection Program in Pygame and Music21

Can you sing?

I certainly can’t yet. And I’d have no idea where to begin. But I can program, and programming is a tool for exploring new universes.

So today, we’ll build the beginning of a game to teach us to sing, the vocal range detector.

It’ll tell us how low and high we are capable of singing, and we can then use this to train ourselves to sing the proper notes that we’re capable of.

Some Background on Vocal Range

A person’s vocal range is the set of notes that they’re capable of singing. It differs according to a person’s genetics.

The way we talk about a person’s vocal range is in the number of octaves they can sing. Somebody like Freddie Mercury could sing from F2 to D6. This is nearly four octaves of range.

Our program will help us to figure out our own vocal range, by first doing pitch detection on our voice, and then turning that pitch into a musical note.

If you’re not sure about octaves and notes, read my post about making a voice controlled Flappy Bird to get started.

Detecting A Voice

We’ll build from our existing program we wrote in the video synthesizer, and break that out into the beginning of a library.

import aubio
import numpy as np
import pyaudio

import time
import argparse

import queue

import music21  # yes! new favorite library

parser = argparse.ArgumentParser()
parser.add_argument("-input", required=False, type=int, help="Audio Input Device")
args = parser.parse_args()

if not args.input:
    print("No input device specified. Printing list of input devices now: ")
    p = pyaudio.PyAudio()
    for i in range(p.get_device_count()):
        print("Device number (%i): %s" % (i, p.get_device_info_by_index(i).get('name')))
    print("Run this program with -input 1, or the number of the input you'd like to use.")
    exit()

# PyAudio object.
p = pyaudio.PyAudio()

# Open stream.
stream = p.open(format=pyaudio.paFloat32,
                channels=1, rate=44100, input=True,
                input_device_index=args.input, frames_per_buffer=4096)
time.sleep(1)

# Aubio's pitch detection.
pDetection = aubio.pitch("default", 2048, 2048//2, 44100)
# Set unit.
pDetection.set_unit("Hz")
pDetection.set_silence(-40)

q = queue.Queue()


def get_current_note(volume_thresh=0.01, printOut=False):
    """Returns the Note Currently Played on the q object when audio is present
    
    Keyword arguments:

    volume_thresh -- the volume threshold for input. defaults to 0.01
    printOut -- whether or not to print to the terminal. defaults to False
    """
    current_pitch = music21.pitch.Pitch()

    while True:

        data = stream.read(1024, exception_on_overflow=False)
        samples = np.fromstring(data,
                                dtype=aubio.float_type)
        pitch = pDetection(samples)[0]

        # Compute the energy (volume) of the
        # current frame.
        volume = np.sum(samples**2)/len(samples) * 100

        if pitch and volume > volume_thresh:  # adjust with your mic!
            current_pitch.frequency = pitch
        else:
            continue

        if printOut:
            print(current_pitch)
        
        else:
            current = current_pitch.nameWithOctave
            q.put({'Note': current, 'Cents': current_pitch.microtone.cents})

if __name__ == '__main__':
    get_current_note(volume_thresh=0.001, printOut=True)

Now, there’s a few things I want to call out in the above code.

First, we’ve started using docstrings in our function above. This allows us to interactively query our functions from the iPython shell.

We’ve also added some default parameters to our function. This means we can leave these parameters out of our function call, and they’ll be automatically added.

Finally, at the end of our program, we have a if __name__ statement.

This won’t get run if we import this file. It’s a great feature, and allows us to start thinking of our voiceController.py as an actual library, and not a separate Python 3 program.

Animating the Voice Detection with Pygame

In Pygame, we’ll create a line graph, and draw the person’s voice deviation from a proper note.

To support this, we’ll also create two different font types, and draw them to the screen.

Once we’ve got a low note, we can then get a high note, and have a complete vocal range for our user.

The program is straightforward enough, and by importing our voiceController, ends up looking like the rest of the programs on Make Art with Python. Small and easily readable.

from threading import Thread
import pygame

from voiceController import q, get_current_note

pygame.init()

screenWidth, screenHeight = 288, 512
screen = pygame.display.set_mode((screenWidth, screenHeight))
clock = pygame.time.Clock()

running = True

titleFont = pygame.font.Font("assets/Bungee-Regular.ttf", 34)
titleText = titleFont.render("Sing a", True, (0, 128, 0))
titleCurr = titleFont.render("Low Note", True, (0, 128, 0))

noteFont = pygame.font.Font("assets/Roboto-Medium.ttf", 55)

t = Thread(target=get_current_note)
t.daemon = True
t.start()


low_note = ""
high_note = ""
have_low = False
have_high = True

noteHoldLength = 20  # how many samples in a row user needs to hold a note
noteHeldCurrently = 0  # keep track of how long current note is held
noteHeld = ""  # string of the current note

centTolerance = 20  # how much deviance from proper note to tolerate

while running:
    for event in pygame.event.get():
        if event.type == pygame.QUIT:
            running = False
        if event.type == pygame.KEYDOWN and event.key == pygame.K_q:
            running = False

    screen.fill((0, 0, 0))

    # draw line to show visually how far away from note voice is
    pygame.draw.line(screen, (255, 255, 255), (10, 290), (10, 310))
    pygame.draw.line(screen, (255, 255, 255), (screenWidth - 10, 290),
                     (screenWidth - 10, 310))
    pygame.draw.line(screen, (255, 255, 255), (10, 300),
                     (screenWidth - 10, 300))

    # our user should be singing if there's a note on the queue
    if not q.empty():
        b = q.get()
        if b['Cents'] < 15:
            pygame.draw.circle(screen, (0, 128, 0), 
                               (screenWidth // 2 + (int(b['Cents']) * 2),300),
                               5)
        else:
            pygame.draw.circle(screen, (128, 0, 0),
                               (screenWidth // 2 + (int(b['Cents']) * 2), 300),
                               5)

        noteText = noteFont.render(b['Note'], True, (0, 128, 0))
        if b['Note'] == noteHeldCurrently:
            noteHeld += 1
            if noteHeld == noteHoldLength:
                if not have_low:
                    low_note = noteHeldCurrently
                    have_low = True
                    titleCurr = titleFont.render("High Note", True, 
                                                 (128, 128, 0))
                else:
                    if int(noteHeldCurrently[-1]) <= int(low_note[-1]):
                        noteHeld = 0  # we're holding a lower octave note
                    elif int(noteHeldCurrently[-1]) and not high_note:
                        high_note = noteHeldCurrently
                        have_high = True
                        titleText = titleFont.render("Perfect!", True,
                                                     (0, 128, 0))
                        titleCurr = titleFont.render("%s to %s" % 
                                                     (low_note, high_note), 
                                                     True, (0, 128, 0))
        else:
            noteHeldCurrently = b['Note']
            noteHeld = 1
        screen.blit(noteText, (50, 400))

    screen.blit(titleText, (10,  80))
    screen.blit(titleCurr, (10, 120))
    pygame.display.flip()
    clock.tick(30)

You’ll see we follow the same pattern as our video synthesizer, creating a separate thread to handle the detection of vocals and vocal pitch.

Besides this, we update the text being drawn on the screen as our user holds each specific note.

Testing it In Action

I’ve been playing with this vocal range detector for a few hours, and the direct feedback of how far off I am from the notes helps me to determine what holding a note feels like.

I’d say this small, simple project is a success. I’ve got a basic sketch, one that is easily understandable, and helps to practice my note holding already.

Where to Go From Here

This post is part of a series, building a musical game from scratch. The code for this specific post is here.

I want to end up with a game that teaches the player how to sing properly, along with basic rhythm and music knowledge.

If you’re interested in following along, I encourage you to sign up for the Make Art with Python mailing list, and to create an account here on Make Art with Python.

Also, feel free to share this post with your friends, as it helps me to continue making these sort of tutorials.

Share on

Twitter Facebook Google+ LinkedIn

Vocal Range Detector in Python with Pygame and Music21

Kirk Kaiser

Building a Vocal Range Detection Program in Pygame and Music21

Can you sing?

Some Background on Vocal Range

Detecting A Voice

Animating the Voice Detection with Pygame

Testing it In Action

Where to Go From Here

Share on

Leave a Comment

You May Also Enjoy

GPU-Accelerated, Deterministic ML Dev Environments with Docker and CUDA

Precision in Technical Communication

Tools as Creative Constraints

Building a remote controlled skateboard ramp