Kinect v1 SDK C++ - 3. Kinect Point Clouds

Have a Kinect v2? Head over to the Kinect v2 Tutorial 3. This tutorial is for the v1 Kinect SDK.

Goals: Learn how to align color and depth images to get a colored point cloud.
Source: View Source Download: 3_PointCloud.zip

Overview

There are several new steps we want to take in this tutorial. The most interesting part is that now we're working with 3D data! Creating an interactive system is a bit too much code for us, though, so we just have a simple rotating point cloud. This tutorial has three parts: first, we'll talk briefly about why point clouds are harder than you might think. Then, we'll show the Kinect SDK side of how to get the right data. Finally, we'll show some OpenGL tricks to make things easy to display.

Depth and RGB Coordinate Systems
Kinect Code
OpenGL Display
Putting it all together

Depth and RGB Coordinate Systems

Kinect Coordinate System

The Kinect uses a cartesian coordinate system centered at the Kinect. The positive Y axis points up, the positive Z axis points where the Kinect is pointing, and the positive X axis is to the left.

Alignment

A naive way of making a point cloud might directly overlap the depth and color images, so that depth pixel (x,y) goes with image pixel (x,y). However, this would give you a poor quality depth map, where the borders of objects don't line up with the colors. This occurs because the RGB camera and the depth camera are located at different spots on the Kinect; obviously, then, they aren't seeing the same things! Normally, we'd have to do some kind of alignment of the two cameras (the formal term is registration) to be able to map from one coordinate space to the other. Fortunately Microsoft has already done this for us, so all we need to do is call the right functions.


Direct overlap of RGB and Depth	Registered RGB and Depth

Note that computer vision and robotics researchers don't like the quality of the built-in registration, so they often do it manually using something like OpenCV.

Kinect Code

A lot of this is just combining the code from the first two tutorials.

Kinect Initialization

There's nothing new in initialization. We simply need two image streams, one for depth and one for color.

HANDLE rgbStream;
HANDLE depthStream;
INuiSensor* sensor;

bool initKinect() {
    // Get a working kinect sensor
    int numSensors;
    if (NuiGetSensorCount(&numSensors) < 0 || numSensors < 1) return false;
    if (NuiCreateSensorByIndex(0, &sensor) < 0) return false;

    // Initialize sensor
    sensor->NuiInitialize(NUI_INITIALIZE_FLAG_USES_DEPTH | NUI_INITIALIZE_FLAG_USES_COLOR);

    sensor->NuiImageStreamOpen(
        NUI_IMAGE_TYPE_DEPTH,                     // Depth camera or rgb camera?
        NUI_IMAGE_RESOLUTION_640x480,             // Image resolution
        0,      // Image stream flags, e.g. near mode
        2,      // Number of frames to buffer
        NULL,   // Event handle
        &depthStream);
    sensor->NuiImageStreamOpen(
        NUI_IMAGE_TYPE_COLOR,                     // Depth camera or rgb camera?
        NUI_IMAGE_RESOLUTION_640x480,             // Image resolution
        0,      // Image stream flags, e.g. near mode
        2,      // Number of frames to buffer
        NULL,   // Event handle
        &rgbStream);
    return sensor;
}

Getting depth data from the Kinect

The Kinect SDK provides a function that tells you which pixel in the RGB image corresponds with a particular point in the depth image. We'll store this information in another global array, depthToRgbMap. In particular, we store the column and row (i.e. x and y coordinate) of the color pixel in order for each depth pixel.

Now that we're dealing with 3D data, we want to imagine the depth frame as a bunch of points in space rather than a 640x480 image. So in our getDepthData function, we will fill in our buffer with the coordinates of each point (instead of the depth at each pixel). This means the buffer we pass into it has to have size width*height*3*sizeof(float) for float typed coordinates.

// Global Variables
long depthToRgbMap[width*height*2];
// ...
void getDepthData(GLubyte* dest) {
// ...
        const USHORT* curr = (const USHORT*) LockedRect.pBits;
        float* fdest = (float*) dest;
        long* depth2rgb = (long*) depthToRgbMap;
        for (int j = 0; j < height; ++j) {
            for (int i = 0; i < width; ++i) {
                // Get depth of pixel in millimeters
                USHORT depth = NuiDepthPixelToDepth(*curr);
                // Store coordinates of the point corresponding to this pixel
                Vector4 pos = NuiTransformDepthImageToSkeleton(i,j,*curr);
                *fdest++ = pos.x/pos.w;
                *fdest++ = pos.y/pos.w;
                *fdest++ = pos.z/pos.w;
                // Store the index into the color array corresponding to this pixel
                NuiImageGetColorPixelCoordinatesFromDepthPixelAtResolution(
                    NUI_IMAGE_RESOLUTION_640x480, // color frame resolution
                    NUI_IMAGE_RESOLUTION_640x480, // depth frame resolution
                    NULL,                         // pan/zoom of color image (IGNORE THIS)
                    i, j, *curr,                  // Column, row, and depth in depth image
                    depth2rgb, depth2rgb+1        // Output: column and row (x,y) in the color image
                );
                depth2rgb += 2;
                *curr++;
            }
        }
// ...

A lot of stuff here!

Vector4 is Microsoft's 3D point type in homogeneous coordinates. If your linear algebra is rusty, don't worry about homogeneous coordinates - just treat it as a 3D point with x,y,z coordinates. A short explanation can be found at http://sunshine2k.blogspot.com/2011/12/reason-for-homogeneous-4d-coordinates.html
NuiTransformDepthImageToSkeleton gives you the 3D coordinates of a particular depth pixel. This is in the Kinect-based coordinate system as described above. There is also a version of this function that takes an additional resolution argument.
NuiImageGetColorPixelCoordinatesFromDepthPixelAtResolution is takes the depth pixel (row, column, and depth in the depth image) and gives the row and column of the pixel in the color image. Here's the API reference page

Getting color data from the Kinect

Now that we are thinking about things in terms of points instead of rectangular grids, we want our color output to be associated with a particular depth point. In particular, the input to our getRgbData function, analogously to the getDepthData function, wants a buffer of size width*height*3*sizeof(float) to hold the red, green, and blue values for each point in our point cloud.

void getRgbData(GLubyte* dest) {
// ...
        const BYTE* start = (const BYTE*) LockedRect.pBits;
        float* fdest = (float*) dest;
        long* depth2rgb = (long*) depthToRgbMap;
        for (int j = 0; j < height; ++j) {
            for (int i = 0; i < width; ++i) {
                // Determine color pixel for depth pixel i,j
                long x = *depth2rgb++;
                long y = *depth2rgb++;
                // If out of bounds, then do not color this pixel
                if (x < 0 || y < 0 || x > width || y > height) {
                    for (int n = 0; n < 3; ++n) *fdest++ = 0.f;
                }
                else {
                    // Determine rgb color for depth pixel (i,j) from color pixel (x,y)
                    const BYTE* color = start + (x+width*y)*4;
                    for (int n = 0; n < 3; ++n) *fdest++ = color[2-n]/255.f;
                }
            }
        }
// ...

There's a bit of funny math in those last couple of lines, so let's walk through it. First of all, the color image frame is in BGRA format, one byte per channel, laid out row by row. So the linear index for pixel (x,y) is x + width*y. Then, the 4-byte block we want is at start + linearindex*4. Finally, we want to convert from byte-valued (0-255) BGRA into float-valued (0.0-1.0) RGB, so we reverse the order of the bytes and divide by 255: color[2-n]/255.f.

OpenGL Display

We're going to use array buffers to display our point cloud. What are array buffers? They let you replace a series of glBegin, glColor, glVertex, glEnd calls with a single function call. As a bonus, the array buffers are stored on the GPU so displaying them is more efficient. They do make the code a bit more complicated, though. Want to skip the array buffers? Go here.

To use array buffers, we have to deal with OpenGL extensions. To make this easier, we use GLEW.

Installing GLEW

Download and unzip the GLEW binaries from http://glew.sourceforge.net/
Copy the contents of the Include/ and Lib/ directories you just unzipped into the appropriate Windows SDK directories. e.g.
- C:/Program Files/Microsoft SDKs/Windows/v7.0A/Include/ and C:/Program Files/Microsoft SDKs/Windows/v7.0A/Lib/ for Visual Studio 2010
- C:/Program Files/Windows Kits (x86)/8.1/Include/um/ and C:/Program Files (x86)/Windows Kits/8.1/Lib/winv6.3/um/ for Visual Studio 2012+
Copy bin/x64/glew32.dll into C:/Windows/System32 and bin/x86/glew32.dll into C:/Windows/SysWOW64. If you have a 32-bit system, just move bin/x86/glew32.dll into C:/Windows/System32

Add glew32.lib to the SDL or OpenGL property sheet's Linker > Input > Additional Dependencies.

OpenGL Code

Since we're dealing with 3D data, we now also have to worry about camera settings. We use a gluPerspective and gluLookAt to deal with that for us.

// Global variables:
GLuint vboId; // Vertex buffer ID
GLuint cboId; // Color buffer ID

// ...
    // OpenGL setup
    glClearColor(0,0,0,0);
    glClearDepth(1.0f);

    // Set up array buffers
    const int dataSize = width*height * 3 * 4;
    glGenBuffers(1, &vboId);
    glBindBuffer(GL_ARRAY_BUFFER, vboId);
    glBufferData(GL_ARRAY_BUFFER, dataSize, 0, GL_DYNAMIC_DRAW);
    glGenBuffers(1, &cboId);
    glBindBuffer(GL_ARRAY_BUFFER, cboId);
    glBufferData(GL_ARRAY_BUFFER, dataSize, 0, GL_DYNAMIC_DRAW);

    // Camera setup
    glViewport(0, 0, width, height);
    glMatrixMode(GL_PROJECTION);
    glLoadIdentity();
    gluPerspective(45, width /(GLdouble) height, 0.1, 1000);
    glMatrixMode(GL_MODELVIEW);
    glLoadIdentity();
    gluLookAt(0,0,0,0,0,1,0,1,0);

For display purposes, rather than having a fully interactive setup we just have a rotating camera that rotates around the point 3 meters in front of the Kinect. See the code for details.

Putting it all together

We wrote those nice functions getDepthData and getRgbData, but how do we use them? What we do is allocate some memory on the GPU and then use our functions to copy our point cloud data there.

void getKinectData() {
    const int dataSize = width*height*3*sizeof(float);
    GLubyte* ptr;
    glBindBuffer(GL_ARRAY_BUFFER, vboId);
    ptr = (GLubyte*) glMapBuffer(GL_ARRAY_BUFFER, GL_WRITE_ONLY);
    if (ptr) {
        getDepthData(ptr);
    }
    glUnmapBuffer(GL_ARRAY_BUFFER);

    glBindBuffer(GL_ARRAY_BUFFER, cboId);
    ptr = (GLubyte*) glMapBuffer(GL_ARRAY_BUFFER, GL_WRITE_ONLY);
    if (ptr) {
        getRgbData(ptr);
    }
    glUnmapBuffer(GL_ARRAY_BUFFER);
}

Now we want to use the glDrawArrays function to draw our point cloud.

void drawKinectData() {
    getKinectData();
    rotateCamera();

    glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
    glEnableClientState(GL_VERTEX_ARRAY);
    glEnableClientState(GL_COLOR_ARRAY);

    glBindBuffer(GL_ARRAY_BUFFER, vboId);
    glVertexPointer(3, GL_FLOAT, 0, NULL);

    glBindBuffer(GL_ARRAY_BUFFER, cboId);
    glColorPointer(3, GL_FLOAT, 0, NULL);

    glPointSize(1.f);
    glDrawArrays(GL_POINTS, 0, width*height);

    glDisableClientState(GL_VERTEX_ARRAY);
    glDisableClientState(GL_COLOR_ARRAY);
}

Note that we could just as well replace all the array buffer code with

// Global Variables
float colorarray[width*height*3];
float vertexarray[width*height*3];
//...
void getKinectData() {
    getDepthData((*GLubyte*) vertexarray);
    getRgbData((GLubyte*) colorarray);
}
void drawKinectData() {
    getKinectData();
    rotateCamera();
    glBegin(GL_POINTS);
    for (int i = 0; i < width*height; ++i) {
        glColor3f(colorarray[i*3], colorarray[i*3+1], colorarray[i*3+2]);
        glVertex3f(vertexarray[i*3], vertexarray[i*3+1], vertexarray[i*3+2]);
    }
    glEnd();
}

The End! Build and run, making sure that your Kinect is plugged in. You should see a window containing a rotating color point cloud of what your kinect sees.

Previous: Depth Images

Next: Skeletal Data