Deep Learning from Scratch on the Jetson Nano

Jetson Nano

For the past few years, getting started with machine learning has been expensive.

In order to train your own models, you need either a beefy NVIDIA GPU or cash to burn on renting cloud computers. Playing around with machine learning outside of work generally means building a desktop computer, and running Linux on it.

Recently, that’s changed.

NVIDIA has come out with a few new embedded computers, ready to do machine learning at a reasonable cost. The most exciting (and recent!) platform is the NVIDIA Jetson Nano.

It’s basically a super powered Raspbery Pi, with a GPU built right in. And in today’s post, we’ll use it to get ~4.5 frames per second of inference using the tf-pose-estimation model, along with a dab and t-pose detection model we’ll collect data for and train on the device.

So let’s dive in, and see how we can build machine learning models on the $99 Jetson Nano.

First, Gather the Hardware

Webcam, Jetson Nano, and Z Stick

I’ll be recreating my Dab and T-Pose controlled lights from before on the Jetson Nano as a first project.

I’m using a Logitech C920 webcam, along with a Z-Wave Z-Stick in order to control a Z-Wave switch.

The main benefit of using Z-Wave is that it’s completely separate from my home WiFi system. I didn’t want to add devices to my home’s WiFi, and still wanted to be able to control my lights from anywhere in my house, or anywhere else I decide to take the device, without having to configure anything new.

No cloud required!

Architect a Solution

Architecture of Jetson Nano Dab T-Pose

Before I get too far in to any new project, I first like to verify that my overall idea will work.

As we saw before, the Jetson TX2 only got a few frames per second running the OpenPose model. With a less powerful embedded system, the first thing I want to verify is that the model we pick to do pose detection gets decent enough FPS for detection.

After poking around a bit, it seems the tf-pose-estimation repo gets ~10fps on the TX2. That sounds much better than our 1.5fps in our original project, so it should work.

But in running tf-pose-estimation, I can see that our outputs no longer match the OpenPose outputs. Instead of 25 points, we have 18 points to work with. So because of that, we’ll need to retrain our original model from the last post to fit the input data set.

Download and Install Jetson Nano version of Tensorflow and Keras

Tensorflow Logo

NVIDIA maintains an optimized version of Tensorflow built specifically for the Jetson platform. As tf-pose-estimation requires Tensorflow, we’ll need to install it first.

Begin by first installing the requirements:

$ sudo apt-get update
$ sudo apt-get install libhdf5-serial-dev hdf5-tools libhdf5-dev zlib1g-dev \
zip libjpeg8-dev libatlas-base-dev gfortran 
$ sudo pip3 install -U pip
$ sudo pip3 install -U numpy grpcio absl-py py-cpuinfo psutil portpicker \ 
six mock requests gast h5py astor termcolor protobuf keras-applications \
keras-preprocessing wrapt google-pasta 

The pip3 install in the last line will take a few minutes. Once it’s complete, we should be ready to install NVIDIA’s distribution of Tensorflow for the Jetson NANO:

 $ sudo pip3 install --pre --extra-index-url \
 https://developer.download.nvidia.com/compute/redist/jp/v42 tensorflow-gpu

Verify everything works with a:

$ sudo python3 -m pip install ipython keras Cython 
$ ipython
>>> import tensorflow
>>> import keras

Once we’ve verified that Tensorflow is set up, we can now move on to installing our pose estimation model, so we can collect data from the webcam to train a new model on.

Download and Setup tf-pose-estimation

To get a Pose Estimator up and running on the Jetson Nano, first we’ll need to install the llvm dependencies:

$ sudo apt-get install libllvm-7-ocaml-dev libllvm7 llvm-7 llvm-7-dev \
llvm-7-doc llvm-7-examples llvm-7-runtime libxft-dev swig
$ export LLVM_CONFIG=/usr/bin/llvm-config-7 

Then we can follow along with the rest of the instructions from the repo:

$ git clone https://www.github.com/ildoonet/tf-pose-estimation
$ cd tf-pose-estimation
$ pip3 install -r requirements.txt
$ cd tf_pose/pafprocess
$ swig -python -c++ pafprocess.i && python3 setup.py build_ext --inplace

With this, we can verify that everything runs with a:

$ python3 run_webcam.py --model=mobilenet_thin --resize=368x368 --camera=0

And we should get our first verification that everything is working properly:

… But we can improve that performance! If you grab yourself a 5v power supply, you can switch the headers on the Jetson Nano, and switch to a high power mode allowing for better performance. It’s as easy as:

$ sudo nvpmodel -m 0

With that, you should get ~1 more fps! Not bad!

Design and Collect Data for Training Your Network

Collecting Sample Data

Now that we’ve verified our inference process works, we can move on to gathering data to train.

It’s a very similar process to the last post, but keep in mind that our new model outputs data in a different format.

Instead of 25 points, we’re working with only 18. And we don’t need to convert our pixel points to decimals, as our pose detection model outputs values from 0..1.

Again, we’ll create keys on the keyboard to select which type of input we’re labeling at capture time. Press b on the keyboard to capture a dab, m for a tpose, and / to mark the ‘other’ category for all other poses.

The main capture loop looks like this:

    while True:
        ret_val, image = cam.read()

        #logger.debug('image process+')
        humans = e.inference(image, resize_to_default=(w > 0 and h > 0), upsample_size=args.resize_out_ratio)
        
        infer = []
        for human in humans:
            hummie = []
            # we're running on the COCO dataset
            # https://github.com/CMU-Perceptual-Computing-Lab/openpose/blob/master/doc/output.md#pose-output-format-coco
            for i in range(18): 
                if i in human.body_parts.keys(): 
                    hummie.append(np.array([human.body_parts[i].x, human.body_parts[i].y], dtype=np.float32)) 
                else: 
                    hummie.append(np.array([0.0, 0.0], dtype=np.float32))
            infer.append(hummie)
        
        
        image = TfPoseEstimator.draw_humans(image, humans, imgcopy=False)

        #logger.debug('show+')
        cv2.putText(image,
                    "FPS: %f" % (1.0 / (time.time() - fps_time)),
                    (10, 10),  cv2.FONT_HERSHEY_SIMPLEX, 0.5,
                    (0, 255, 0), 2)
        cv2.imshow('tf-pose-estimation result', image)
        fps_time = time.time()

        # check for iput
        key = cv2.waitKey(1) & 0xFF
        if key == ord("q"):
            break
        elif key == ord("b"):
            print("Dab: " + str(infer[0]))
            dabs.append(infer[0])
        elif key == ord("m"):
            print("TPose: " + str(infer[0]))
            tposes.append(infer[0])
        elif key == ord("/"):
            print("Other: " + str(infer[0]))
            other.append(infer[0])
                
    dabs = np.asarray(dabs)
    tposes = np.asarray(tposes)
    other = np.asarray(other)

    np.save('COCOdabs.npy', dabs)
    np.save('COCOtposes.npy', tposes)
    np.save('COCOother.npy', other)

Note that we write out our data as a NumPy binary for each set of examples.

Later, when we train our new network to detect, we’ll have each set all ready to go.

Training On Your Cleaned Data

Training Network in Jupyter Notebook

Although I could train the new network on my own computer, I’ve noticed better inference accuracy when I train directly within the Jetson Nano. So the next thing we’ll need to do is set up our environment to run the Jupyter notebook for training:

$ sudo python3 -m pip install jupyter pandas seaborn matplotlib sklearn
$ jupyter notebook

With the notebook selected (just click on Data Play.ipynb), we just press SHIFT+ENTER to execute each step of the program, one piece after the other.

At the end, we should have a trained model ready to go.

Connecting Our Model to the Physical World with Z-Wave

The first thing we’ll need to do is install the libraries for the Z-Wave USB stick. Luckily, we’ve got prebuilt versions of most things, and adding the Z-Wave Python libraries is just a couple lines:

$ sudo apt-get install --force-yes -y make libudev-dev g++ libyaml-dev openzwave1.5
$ sudo python3 -m pip install python_openzwave

With this, we just need to add a method to our existing loop of detection. The main code to turn on and off the lights looks like this:

            output = tposer.predict_classes(np.array(infer, dtype=np.float32))
            for j in output:
                if j == 1:
                    print("dab detected")
                    if LIGHTS == 0 or (time.time() - bounced) < debounce:
                        continue
                    for node in network.nodes:
                        for val in network.nodes[node].get_switches():
                            network.nodes[node].set_switch(val, False)
                    LIGHTS = 0
                    bounced = time.time()
                elif j == 2:
                    print("tpose detected")
                    if LIGHTS == 1 or (time.time() - bounced) < debounce:
                        continue
                    for node in network.nodes:
                        for val in network.nodes[node].get_switches():
                            network.nodes[node].set_switch(val, True)
                    LIGHTS = 1
                    bounced = time.time()                    

And now we can control our lights with dance!

What’s Next?

If you noticed the name of the Github repo attached to this post, you’ve seen that I plan on making portions of my new home controlled via dance and plants.

I plan on running all this from my Jetson Nano. I’ll add a Teensy microcontroller to handle my plant touch sensing, but we’ll save that for the next post.

If you want to be notified when the next post is live, feel free to sign up for my mailing list below:

Updated: