Kinect v2 SDK C++ - 4. Kinect Body Tracking

Goals: Learn how to get body tracking data, specifically joint positions, from the Kinect.
Source: View Source      Download: 4_SkeletalTracking.zip


Overview

This is a fairly simple tutorial that shows you how to get basic information about human bodies in the view of the Kinect. We will show how to extract the 3D positions of the body's joints, which can then be further processed to do things a simple as drawing a skeleton, to things as complex as gesture recognition.

To do this, we will start with the framework we made in the point cloud tutorial and add on a few bits.

Contents

  1. Kinect Code
  2. OpenGL Display

Kinect Code

Global Variables

We will keep as global variables a boolean telling us whether we see a body (and thus whether or not to draw the arms), as well as an array of all the Joints in the last seen body.
// Body tracking variables
BOOLEAN tracked;                            // Whether we see a body
Joint joints[JointType_Count];              // List of joints in the tracked body
        
The joints array contains one Joint structure for each tracked joint in the body. The documentation lists the JointType_Count = 25 joints that are tracked by the Kinect. For example, we can get the location of the user's right elbow by looking at joints[JointType_ElbowRight].Position.

Kinect Initialization

The only new thing here: when we open our MultiSourceFrameReader, we also request body tracking data.
IKinectSensor* sensor;             // Kinect sensor
IMultiSourceFrameReader* reader;   // Kinect data source
ICoordinateMapper* mapper;         // Converts between depth, color, and 3d coordinates

bool initKinect() {
    if (FAILED(GetDefaultKinectSensor(&sensor))) {
        return false;
    }
    if (sensor) {
        sensor->get_CoordinateMapper(&mapper);

        sensor->Open();
        sensor->OpenMultiSourceFrameReader(
                  FrameSourceTypes::FrameSourceTypes_Depth
                | FrameSourceTypes::FrameSourceTypes_Color
                | FrameSourceTypes::FrameSourceTypes_Body,
            &reader);
        return reader;
    } else {
        return false;
    }
}
        

Getting joint data from the Kinect

In our new getBodyData function, we still request the frame from the MultiSourceFrame as before.
void getBodyData(IMultiSourceFrame* frame) {
    IBodyFrame* bodyframe;
    IBodyFrameReference* frameref = NULL;
    frame->get_BodyFrameReference(&frameref);
    frameref->AcquireFrame(&bodyframe);
    if (frameref) frameref->Release();

    if (!bodyframe) return;

    // ------ NEW CODE ------
    IBody* body[BODY_COUNT];
    bodyframe->GetAndRefreshBodyData(BODY_COUNT, body);
    for (int i = 0; i < BODY_COUNT; i++) {
        body[i]->get_IsTracked(&tracked);
        if (tracked) {
            body[i]->GetJoints(JointType_Count, joints);
            break;
        }
    }
    // ------ END NEW CODE ------

    if (bodyframe) bodyframe->Release();
}
The Kinect can track up to BODY_COUNT people simultaneously (in the SDK, BODY_COUNT == 6). We populate an array of IBody pointers using the IBodyFrame::GetAndRefreshBodyData function. Note that each IBody may not necessarily refer to an actual person that the Kinect can see, and the first tracked body might not necessarily be the first array element. Thus, we need to check whether or not each of the elements is a tracked body.

For simplicity, we only want to deal with one person in this app at a time. So we check whether each of the returned bodies refers to a tracked person, and simply break out of the loop after we find one. We populate the joints array with the positions of all of the joints in the body with the IBody::GetJoints function. We also keep the tracked state in our global tracked variable.

Note: Always call IBodyFrame::GetAndRefreshBodyData with BODY_COUNT as the first argument, and IBody::GetJoints with JointType_Count as the first argument.


OpenGL Display

We will simply take the coordinates given to us in the Joint array, and draw some lines showing the body's arms. This means that we will draw one line from the right shoulder to the right elbow, then one from the right elbow to the right wrist, and likewise for the left side. Of course, we only want to do this if the Kinect can see someone, so we check the tracked global boolean first.

The joint positions are represented as 3D CameraSpacePoints, in the Position field of the Joint structure.

void drawKinectData() {
    // ...
    if (tracked) {
        // Draw some arms
        const CameraSpacePoint& lh = joints[JointType_WristLeft].Position;
        const CameraSpacePoint& le = joints[JointType_ElbowLeft].Position;;
        const CameraSpacePoint& ls = joints[JointType_ShoulderLeft].Position;;
        const CameraSpacePoint& rh = joints[JointType_WristRight].Position;;
        const CameraSpacePoint& re = joints[JointType_ElbowRight].Position;;
        const CameraSpacePoint& rs = joints[JointType_ShoulderRight].Position;;
        glBegin(GL_LINES);
        glColor3f(1.f, 0.f, 0.f);
        // lower left arm
        glVertex3f(lh.X, lh.Y, lh.Z);
        glVertex3f(le.X, le.Y, le.Z);
        // upper left arm
        glVertex3f(le.X, le.Y, le.Z);
        glVertex3f(ls.X, ls.Y, ls.Z);
        // lower right arm
        glVertex3f(rh.X, rh.Y, rh.Z);
        glVertex3f(re.X, re.Y, re.Z);
        // upper right arm
        glVertex3f(re.X, re.Y, re.Z);
        glVertex3f(rs.X, rs.Y, rs.Z);
        glEnd();
    }
}
    

The End! Build and run, making sure that your Kinect is plugged in. You should see a window containing a rotating color point cloud of what your Kinect sees, with red lines showing the arms of a person if there is one in the view of the Kinect.
Previous: Point Clouds

Next: KinectFusion (Coming Soon)