Getting Started with Kinect and Processing

Kinect and Processing

The Microsoft Kinect sensor is a peripheral device (designed for XBox and windows PCs) that functions much like a webcam. However, in addition to providing an RGB image, it also provides a depth map. Meaning for every pixel seen by the sensor, the Kinect measures distance from the sensor. This makes a variety of computer vision problems like background removal, blob detection, and more easy and fun!

The Kinect sensor itself only measures color and depth. However, once that information is on your computer, lots more can be done like “skeleton” tracking (i.e. detecting a model of a person and tracking his/her movements). To do skeleton tracking you’ll need to use Thomas Lengling’s windows-only Kinect v2 processing libray. However, if you’re on a Mac and all you want is raw data from the Kinect, you are in luck! This library uses libfreenect and libfreenect2 open source drivers to access that data for Mac OS X (windows support coming soon).

What hardware do I need?

First you need a “stand-alone” kinect. You do not need to buy an Xbox.

Some additional notes about different models:

SimpleOpenNI

You could also consider using the SimpleOpenNI library and read Greg Borenstein’s Making Things See book. OpenNI has features (skeleton tracking, gesture recognition, etc.) that are not available in this library. Unfortunately, OpenNI was recently purchased by Apple and, while I thought it was shut, down there appear to be some efforts to revive it!. It’s unclear what the future will be of OpenNI and SimpleOpenNI.

I’m ready to get started right now

The easiest way to install the library is with the Processing Contributions Manager Sketch → Import Libraries → Add library and search for “Kinect”. A button will appear labeled “install”. If you want to install it manually download the most recent release and extract it in the libraries folder. Restart Processing, open up one of the examples in the examples folder and you are good to go!

What is Processing?

Processing is an open source programming language and environment for people who want to create images, animations, and interactions. Initially developed to serve as a software sketchbook and to teach fundamentals of computer programming within a visual context, Processing also has evolved into a tool for generating finished professional work. Today, there are tens of thousands of students, artists, designers, researchers, and hobbyists who use Processing for learning, prototyping, and production.

What if I don’t want to use Processing?

If you are comfortable with C++ I suggest you consider using openFrameworks or Cinder with the Kinect. These environments have some additional features and you also may get a C++ speed advantage when processing the depth data, etc.:

What code do I write?

First thing is to include the proper import statements at the top of your code:

import org.openkinect.processing.*;

As well as a reference to a Kinect object, i.e.

Kinect kinect;

Then in setup() you can initialize that kinect object:

void setup() {
  kinect = new Kinect(this);
  kinect.initDevice();
}

If you are using a Kinect v2, use a Kinect2 object instead.

Kinect2 kinect2;

void setup() {
  kinect2 = new Kinect2(this);
  kinect2.initDevice();
}

Once you’ve done this you can begin to access data from the kinect sensor. Currently, the library makes data available to you in five ways:

Let’s look at these one at a time. If you want to use the Kinect just like a regular old webcam, you can access the video image as a PImage!

PImage img = kinect.getVideoImage();
image(img, 0, 0);

You can simply ask for this image in draw(), however, if you can also use videoEvent() to know when a new image is available.

void videoEvent(Kinect k) {
  // There has been a video event!
}

If you want the IR image:

kinect.enableIR(true);

With kinect v1 cannot get both the video image and the IR image. They are both passed back via getVideoImage() so whichever one was most recently enabled is the one you will get. However, with the Kinect v2, they are both available as separate methods:

PImage video = kinect2.getVideoImage();
PImage ir = kinect2.getIrImage();

Now, if you want the depth image, you can request the grayscale image:

PImage img = kinect.getDepthImage();
image(img, 0, 0);

As well as the raw depth data:

int[] depth = kinect.getRawDepth();

For the kinect v1, the raw depth values range between 0 and 2048, for the kinect v2 the range is between 0 and 4500.

For the color depth image, use kinect.enableColorDepth(true);. And just like with the video image, there’s a depth event you can access if necessary.

void depthEvent(Kinect k) {
  // There has been a depth event!
}

Unfortunately, b/c the RGB camera and the IR camera are not physically located in the same spot, there is a stereo vision problem. Pixel XY in one image is not the same XY in an image from a camera an inch to the right. The Kinect v2 offers what’s called a “registered” image which aligns all the depth values with the RGB camera ones. This can be accessed as follows:

PImage img = kinect2.getRegisteredImage()

Finally, for kinect v1 (but not v2), you can also adjust the camera angle with the setTilt() method.

float angle = kinect.getTilt();
angle = angle + 1;
kinect.setTilt(angle);

So, there you have it, here are all the useful functions you might need to use the Processing kinect library:

For everything else, you can also take a look at the javadoc reference.

Examples

There are four basic examples for both v1 and v2.

Display RGB, IR, and Depth Images

Code for v1:RGBDepthTest

Code for v2:RGBDepthTest2

This example uses all of the above listed functions to display the data from the kinect sensor.

Multiple devices

Both v1 and v2 has multiple kinect support.

Code for v1:MultiKinect

Code for v2:MultiKinect2

Point Cloud

Code for v1: PointCloud

Code for v2: PointCloud

Here, we’re doing something a bit fancier. Number one, we’re using the 3D capabilities of Processing to draw points in space. You’ll want to familiarize yourself with translate(), rotate(), pushMatrix(), popMatrix(). This tutorial is also a good place to start. In addition, the example uses a PVector to describe a point in 3D space. More here: PVector tutorial.

The real work of this example, however, doesn’t come from me at all. The raw depth values from the kinect are not directly proportional to physical depth. Rather, they scale with the inverse of the depth according to this formula:

depthInMeters = 1.0 / (rawDepth * -0.0030711016 + 3.3309495161);

Rather than do this calculation all the time, we can precompute all of these values in a lookup table since there are only 2048 depth values.

float[] depthLookUp = new float[2048];
for (int i = 0; i < depthLookUp.length; i++) {
  depthLookUp[i] = rawDepthToMeters(i);
}

float rawDepthToMeters(int depthValue) {
  if (depthValue < 2047) {
    return (float)(1.0 / ((double)(depthValue) * -0.0030711016 + 3.3309495161));
  }
  return 0.0f;
}

Thanks to Matthew Fisher for the above formula. (Note: for the results to be more accurate, you would need to calibrate your specific kinect device, but the formula is close enough for me so I’m sticking with it for now. More about calibration in a moment.)

Finally, we can draw some points based on the depth values in meters:

  for(int x = 0; x < w; x += skip) {
    for(int y = 0; y < h; y += skip) {
      int offset = x + y * kinect.width;

      // Convert kinect data to world xyz coordinate
      int rawDepth = depth[offset];
      PVector v = depthToWorld(x, y, rawDepth);

      stroke(255);
      pushMatrix();
      // Scale up by 200
      float factor = 200;
      translate(v.x * factor, v.y * factor, factor-v.z * factor);
      // Draw a point
      point(0,0);
      popMatrix();
    }
  }

Average Point Tracking

The real magic of the kinect lies in its computer vision capabilities. With depth information, you can do all sorts of fun things like say: “the background is anything beyond 5 feet. Ignore it!” Without depth, background removal involves all sorts of painstaking pixel comparisons. As a quick demonstration of this idea, here is a very basic example that compute the average xy location of any pixels in front of a given depth threshold.

Source for v1: AveragePointTracking

Source for v2: AveragePointTracking2

In this example, I declare two variables to add up all the appropriate x’s and y’s and one variable to keep track of how many there are.

float sumX = 0;
float sumY = 0;
float count = 0;

Then, whenever we find a given point that complies with our threshold, I add the x and y to the sum:

  if (rawDepth < threshold) {
    sumX += x;
    sumY += y;
    count++;
  }

When we’re done, we calculate the average and draw a point!

if (count != 0) {
  float avgX = sumX / count;
  float avgY = sumY / count;
  fill(255, 0, 0);
  ellipse(avgX, avgY, 16, 16);
}

What’s missing?

FAQ

  1. What are there shadows in the depth image (v1)? Kinect Shadow diagram
  2. What is the range of depth that the kinect can see? (v1) ~0.7–6 meters or 2.3–20 feet. Note you will get black pixels (or raw depth value of 2048) at both elements that are too far away and too close.