Kinect v1 SDK C++ - 1. Kinect Basics

Have a Kinect v2? Head over to the Kinect v2 Tutorial 1. This tutorial is for the v1 Kinect SDK.

Goals: Learn how to initialize a Kinect and get RGB data from it.
Source: View Source Download: 1_Basics.zip

Overview

We have two real pieces of Kinect-specific code. I will go over these in some detail, and give a fairly high level overview of the display code.

Includes, Constants, and Globals
Kinect Code
Windowing, Event Handling, and Main Loop
Display via OpenGL

Includes, Constants, and Globals

Includes

Mostly self explanatory. There are three header files for most Kinect uses, NuiApi, NuiImageCamera, and NuiSensor.

You need to include Ole2.h and Windows.h for the Kinect includes to work correctly. Don't forget to include the relevant code for your windowing system and OpenGL.
GLUT SDL

#include <Windows.h> #include <Ole2.h> #include <gl/GL.h> #include <gl/GLU.h> #include <gl/glut.h> #include <NuiApi.h> #include <NuiImageCamera.h> #include <NuiSensor.h>

#include <Windows.h> #include <Ole2.h> #include <SDL_opengl.h> #include <SDL.h> #include <NuiApi.h> #include <NuiImageCamera.h> #include <NuiSensor.h>

GLUT	SDL
#include <Windows.h> #include <Ole2.h> #include <gl/GL.h> #include <gl/GLU.h> #include <gl/glut.h> #include <NuiApi.h> #include <NuiImageCamera.h> #include <NuiSensor.h>	#include <Windows.h> #include <Ole2.h> #include <SDL_opengl.h> #include <SDL.h> #include <NuiApi.h> #include <NuiImageCamera.h> #include <NuiSensor.h>

Constants and Global Variables

We define the width and height as 640*480, since these are the Kinect camera input dimensions.

Note that the data array will hold a copy of the image we get from the Kinect, so that we can use it as a texture. Experienced OpenGL users may want to use a Frame Buffer Object instead.

#define width 640
#define height 480

// OpenGL Variables
GLuint textureId;              // ID of the texture to contain Kinect RGB Data
GLubyte data[width*height*4];  // BGRA array containing the texture data

// Kinect variables
HANDLE rgbStream;              // The identifier of the Kinect's RGB Camera
INuiSensor* sensor;            // The kinect sensor

Kinect Code

Kinect Initialization

This is our first real Kinect-specific code. The initKinect() function initializes a Kinect sensor for use. This consists of two parts: First we find an attached Kinect sensor, then we initialize it and prepare to read data from it.

bool initKinect() {
    // Get a working kinect sensor
    int numSensors;
    if (NuiGetSensorCount(&numSensors) < 0 || numSensors < 1) return false;
    if (NuiCreateSensorByIndex(0, &sensor) < 0) return false;

    // Initialize sensor
    sensor->NuiInitialize(NUI_INITIALIZE_FLAG_USES_DEPTH | NUI_INITIALIZE_FLAG_USES_COLOR);
    sensor->NuiImageStreamOpen(
        NUI_IMAGE_TYPE_COLOR,            // Depth camera or rgb camera?
        NUI_IMAGE_RESOLUTION_640x480,    // Image resolution
        0,      // Image stream flags, e.g. near mode
        2,      // Number of frames to buffer
        NULL,   // Event handle
        &rgbStream);
    return sensor;
}

Things to note:

Normally, we'd be a bit more careful about return values for all of these functions, and also handle the case where there is more than one Kinect sensor available; however, for brevity we will just try to use the first connected kinect sensor.
The NuiInitialize method takes a bunch of flags specifying which sensor features you are interested in. Right now, we choose the color and depth camera inputs; there are also options for audio input and skeleton input, among others. See the official API for more details.
The NuiImageStreamOpen() method is a little confusing. It initializes a HANDLE that we can later use to get image frames. This function can be used either to set up an RGB color image stream or a depth image stream (based on the first argument). You can ignore the 3rd and 5th arguments for now. Keep the resolution as 640x480 and the buffer size some single digit number. The last argument is a pointer to the HANDLE that we'll use to actually get image frames. See the documentation for more details.

Getting an RGB frame from the Kinect

To actually get a frame from the sensor, we have to fetch it and lock it so it doesn't get corrupted while we're reading it.

void getKinectData(GLubyte* dest) {
    NUI_IMAGE_FRAME imageFrame;
    NUI_LOCKED_RECT LockedRect;
    if (sensor->NuiImageStreamGetNextFrame(rgbStream, 0, &imageFrame) < 0) return;
    INuiFrameTexture* texture = imageFrame.pFrameTexture;
    texture->LockRect(0, &LockedRect, NULL, 0);

There are three types in this short snippet: NUI_IMAGE_FRAME is a structure containing all the metadata about the frame - the number, resolution, etc. NUI_LOCKED_RECT contains a pointer to the actual data. An INuiFrameTexture manages the frame data. So first we acquire a NUI_IMAGE_FRAME from the HANDLE we initialized earlier. Then we get an INuiFrameTexture so that we can get the pixel data out of it, using a NUI_LOCKED_RECT.

Now, we can copy the data to our own memory location. The Pitch of the LockedRect is how many bytes are in each row of the frame; a simple check on that value makes sure that the frame is not empty.

    if (LockedRect.Pitch != 0)
    {
        const BYTE* curr = (const BYTE*) LockedRect.pBits;
        const BYTE* dataEnd = curr + (width*height)*4;

        while (curr < dataEnd) {
            *dest++ = *curr++;
        }
    }

The Kinect data is in BGRA format, so we can copy it directly into our buffer and use it as an OpenGL texture.
Finally, we have to release the frame so that the Kinect can use it again.

    texture->UnlockRect(0);
    sensor->NuiImageStreamReleaseFrame(rgbStream, &imageFrame);
}

Things to note:

Again, we aren't checking return codes on everything. You may want to do this in your applications
If you're calling this update function too quickly, then the Kinect might not have a new frame that you haven't seen yet. In this case, NuiImageStreamGetNextFrame() will return a negative value. The second argument of NuiImageStreamGetNextFrame() specifies how long to wait (in milliseconds) for a new frame before returning failure.
To reiterate, the workflow is as follows:
1. Acquire a frame: sensor->NuiImageStreamGetNextFrame(). The first parameter is the HANDLE we initialized earlier, and the last is a pointer to a NUI_IMAGE_FRAME struct that will contain the frame data. The second argument allows you to wait for a frame if a new one is not ready (see above)
2. Lock the pixel data: imageFrame.pFrameTexture->LockRect(). All the arguments must be 0 or NULL except for the second, which is a pointer to a NUI_LOCKED_RECT struct.
3. Copy the data (from LockedRect.pBits)
4. Unlock the pixel data: imageFrame.pFrameTexture->UnlockRect(). Argument must be 0
5. Release the frame: sensor->NuiImageStreamReleaseFrame(). First argument is the image stream HANDLE, second is a pointer to the image frame that we are releasing

That's all the Kinect code! The rest is just how to get it onscreen.

Windowing, Event Handling, and Main Loop

This section explains the GLUT- or SDL-specific code, consisting of window initialization, event handling, and the main update loop.

The initialization code is specific to which implementation (SDL or GLUT) is used. It simply initializes a window using the appropriate API, returning false on failure. The GLUT version also sets up a main loop by specifying that the draw() function be called every loop iteration.

The main loop is started in the execute() function. In GLUT, the loop is handled behind the scenes, so all we need to do is call the glutMainLoop() function. In SDL we write our own loop. Within each loop, we draw any new frames to the screen; this processing is done in the drawKinect() function.

There are many references online for both GLUT and SDL if you want to do more complex window and loop management or learn more about these functions.

GLUT

void draw() {
   drawKinectData();
   glutSwapBuffers();
}

void execute() {
    glutMainLoop();
}

bool init(int argc, char* argv[]) {
    glutInit(&argc, argv);
    glutInitDisplayMode(GLUT_DEPTH | GLUT_DOUBLE | GLUT_RGBA);
    glutInitWindowSize(width,height);
    glutCreateWindow("Kinect SDK Tutorial");
    glutDisplayFunc(draw);
    glutIdleFunc(draw);
    return true;
}

SDL

void execute() {
    SDL_Event ev;
    bool running = true;
    while (running) {
        while (SDL_PollEvent(&ev)) {
            if (ev.type == SDL_QUIT) running = false;
        }
        drawKinectData();
        SDL_GL_SwapBuffers();
    }
}

bool init(int argc, char* argv[]) {
    SDL_Init(SDL_INIT_EVERYTHING);
    SDL_Surface* screen =
        SDL_SetVideoMode(width, height, 32, SDL_HWSURFACE | SDL_GL_DOUBLEBUFFER | SDL_OPENGL);
    return screen;
}

Display via OpenGL

Initialization

Three steps, as described in the code - Setting up the texture to contain our image frame, preparing OpenGL for drawing our texture, and setting up a camera viewpoint (using an orthographic projection for 2D images).

    // Initialize textures
    glGenTextures(1, &textureId);
    glBindTexture(GL_TEXTURE_2D, textureId);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA8, width, height,
                 0, GL_BGRA, GL_UNSIGNED_BYTE, (GLvoid*) data);
    glBindTexture(GL_TEXTURE_2D, 0);

    // OpenGL setup
    glClearColor(0,0,0,0);
    glClearDepth(1.0f);
    glEnable(GL_TEXTURE_2D);

    // Camera setup
    glViewport(0, 0, width, height);
    glMatrixMode(GL_PROJECTION);
    glLoadIdentity();
    glOrtho(0, width, height, 0, 1, -1);
    glMatrixMode(GL_MODELVIEW);
    glLoadIdentity();

Obviously, we should wrap this up in a function. I just put it into main() for brevity.

int main(int argc, char* argv[]) {
    if (!init(argc, argv)) return 1;
    if (!initKinect()) return 1;

    /* ...OpenGL texture and camera initialization... */

    // Main loop
    execute();
    return 0;
}

Drawing a frame to screen

This is very standard code. We first copy the kinect data into our own buffer, then specify that our texture will use that buffer.

void drawKinectData() {
    glBindTexture(GL_TEXTURE_2D, textureId);
    getKinectData(data);
    glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, width, height, GL_BGRA, GL_UNSIGNED_BYTE, (GLvoid*)data);

Then, we draw a rectangle that is textured with our frame.

    glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
    glBegin(GL_QUADS);
        glTexCoord2f(0.0f, 0.0f);
        glVertex3f(0, 0, 0);
        glTexCoord2f(1.0f, 0.0f);
        glVertex3f(width, 0, 0);
        glTexCoord2f(1.0f, 1.0f);
        glVertex3f(width, height, 0.0f);
        glTexCoord2f(0.0f, 1.0f);
        glVertex3f(0, height, 0.0f);
    glEnd();
}

The End! Build and run, making sure that your Kinect is plugged in. You should see a window containing a video stream of what your Kinect sees.

Previous: Setup

Next: Kinect Depth Data