Body Tracking Overview

The body tracking module focuses on a person’s bones detection and tracking. A detected bone is represented by its two endpoints also called keypoints. The ZED camera can provide 2D and 3D information on each detected keypoints. Furthermore, it produces local rotation between neighbor bones.

How It Works #

The overall process is very similar to the ZED SDK Object detection module. They share some information in outputs like the 3D position and 3D velocity of each person. The body tracking module also uses a neural network for keypoints detection and then calls depth and positional tracking of the ZED SDK module to get the final 3D position of each keypoint. The ZED SDK supports multiple body formats :

The BODY_18 body format contains 18 keypoints following the COCO18 skeleton representation

 

Each keypoint is indexed by an integer from 0 to 17 :

keypoint indexkeypoint namekeypoint indexkeypoint name
0NOSE9RIGHT_KNEE
1NECK10RIGHT_ANKLE
2RIGHT_SHOULDER11LEFT_HIP
3RIGHT_ELBOW12LEFT_KNEE
4RIGHT_WRIST13LEFT_ANKLE
5LEFT_SHOULDER14RIGHT_EYE
6LEFT_ELBOW15LEFT_EYE
7LEFT_WRIST16RIGHT_EAR
8RIGHT_HIP17LEFT_EAR

The BODY_34 body format contains 34 keypoints following this configuration :

 

  • Each keypoint is indexed by an integer from 0 to 33 :
keypoint indexkeypoint namekeypoint indexkeypoint namekeypoint indexkeypoint namekeypoint indexkeypoint name
0PELVIS9LEFT_HANDTIP18LEFT_HIP27NOSE
1NAVAL_SPINE10LEFT_THUMB19LEFT_KNEE28LEFT_EYE
2CHEST_SPINE11RIGHT_CLAVICLE20LEFT_ANKLE29LEFT_EAR
3NECK12RIGHT_SHOULDER21LEFT_FOOT30RIGHT_EYE
4LEFT_CLAVICLE13RIGHT_ELBOW22RIGHT_HIP31RIGHT_EAR
5LEFT_SHOULDER14RIGHT_WRIST23RIGHT_KNEE32LEFT_HEEL
6LEFT_ELBOW15RIGHT_HAND24RIGHT_ANKLE33RIGHT_HEEL
7LEFT_WRIST16RIGHT_HANDTIP25RIGHT_FOOT
8LEFT_HAND17RIGHT_THUMB26HEAD

The BODY_38 body format contains 38 keypoints following this configuration :

 

  • Each keypoint is indexed by an integer from 0 to 37 :
keypoint indexkeypoint namekeypoint indexkeypoint namekeypoint indexkeypoint namekeypoint indexkeypoint name
0PELVIS10LEFT_CLAVICLE20LEFT_KNEE30LEFT_HAND_THUMB_4
1SPINE_111RIGHT_CLAVICLE21RIGHT_KNEE31RIGHT_HAND_THUMB_4
2SPINE_212LEFT_SHOULDER22LEFT_ANKLE32LEFT_HAND_INDEX_1
3SPINE_313RIGHT_SHOULDER23RIGHT_ANKLE33RIGHT_HAND_INDEX_1
4NECK14LEFT_ELBOW24LEFT_BIG_TOE34LEFT_HAND_MIDDLE_4
5NOSE15RIGHT_ELBOW25RIGHT_BIG_TOE35RIGHT_HAND_MIDDLE_4
6LEFT_EYE16LEFT_WRIST26LEFT_SMALL_TOE36LEFT_HAND_PINKY_1
7RIGHT_EYE17RIGHT_WRIST27RIGHT_SMALL_TOE37RIGHT_HAND_PINKY_1
8LEFT_EAR18LEFT_HIP28LEFT_HEEL
9RIGHT_EAR19RIGHT_HIP29RIGHT_HEEL

The BODY_70 body format contains 70 keypoints following this configuration :

 

  • Each keypoint is indexed by an integer from 0 to 69 :
keypoint indexkeypoint namekeypoint indexkeypoint namekeypoint indexkeypoint namekeypoint indexkeypoint name
0PELVIS18LEFT_HIP36LEFT_HAND_INDEX_354RIGHT_HAND_INDEX_1
1SPINE_119RIGHT_HIP37LEFT_HAND_INDEX_455RIGHT_HAND_INDEX_2
2SPINE_220LEFT_KNEE38LEFT_HAND_MIDDLE_156RIGHT_HAND_INDEX_3
3SPINE_321RIGHT_KNEE39LEFT_HAND_MIDDLE_257RIGHT_HAND_INDEX_4
4NECK22LEFT_ANKLE40LEFT_HAND_MIDDLE_358RIGHT_HAND_MIDDLE_1
5NOSE23RIGHT_ANKLE41LEFT_HAND_MIDDLE_459RIGHT_HAND_MIDDLE_2
6LEFT_EYE24LEFT_BIG_TOE42LEFT_HAND_RING_160RIGHT_HAND_MIDDLE_3
7RIGHT_EYE25RIGHT_BIG_TOE43LEFT_HAND_RING_261RIGHT_HAND_MIDDLE_4
8LEFT_EAR26LEFT_SMALL_TOE44LEFT_HAND_RING_362RIGHT_HAND_RING_1
9RIGHT_EAR27RIGHT_SMALL_TOE45LEFT_HAND_RING_463RIGHT_HAND_RING_2
10LEFT_CLAVICLE28LEFT_HEEL46LEFT_HAND_PINKY_164RIGHT_HAND_RING_3
11RIGHT_CLAVICLE29RIGHT_HEEL47LEFT_HAND_PINKY_265RIGHT_HAND_RING_4
12LEFT_SHOULDER30LEFT_HAND_THUMB_148LEFT_HAND_PINKY_366RIGHT_HAND_PINKY_1
13RIGHT_SHOULDER31LEFT_HAND_THUMB_249LEFT_HAND_PINKY_467RIGHT_HAND_PINKY_2
14LEFT_ELBOW32LEFT_HAND_THUMB_350RIGHT_HAND_THUMB_168RIGHT_HAND_PINKY_3
15RIGHT_ELBOW33LEFT_HAND_THUMB_451RIGHT_HAND_THUMB_269RIGHT_HAND_PINKY_4
16LEFT_WRIST34LEFT_HAND_INDEX_152RIGHT_HAND_THUMB_3
17RIGHT_WRIST35LEFT_HAND_INDEX_253RIGHT_HAND_THUMB_4

 

The ZED SDK can output 3 levels of information: raw 2D/3D body detection, 3D body tracking and 3D body fitting.

2D/3D Body detection #

The ZED SDK first uses the ZED camera image to infer all 2D bones and keypoints using neural networks. Then the SDK depth module and positional tracking module are used together to extract the correct 3D position of each bones and keypoints.

3D body tracking #

If tracking is enabled, the ZED SDK will assign an identity to each detected body over time. At the same time, by filtering the raw body detection, it will output a more stable 3D body estimation.

3D body fitting #

Moreover, a user can enable fitting to unlock even more information about each identity. The fitting process takes the history of each tracked person to deduce all missing keypoints thanks to the human kinematic’s constraint used by the body tracking module. It is also able to extract local rotation between a pair of neighbor bones by solving the inverse kinematic problem. These data will be compatible with some known software for avataring for example. Here is an example where BODY_FORMAT::BODY_34 were used to animate an avatar in Unreal.

Detection Outputs #

Each detected person is stored as a structure in the ZED SDK called sl.BodyData.

Body DataDescriptionOutput
IDFixed ID for identifying an object over time.Integer
Tracking stateDefines if an object is currently tracked or lost.Ok, Off, Searching, Terminate
Action stateDefines if an object is currently idle or moving.Idle, Moving
PositionProvides the 3D position of the object according to the camera as a 3D vector (x,y,z).[x, y, z]
VelocityProvides the velocity of the object in space as a 3D vector (x,y,z).[vx, vy, vz]
DimensionsProvides the width, height and length of the object.[width, height, length]
Detection confidenceA lower confidence means the object might not be localized perfectly or that its label is uncertain.0 - 100
2D bounding boxDefines the box surrounding the object in the image represented as four 2D points.Four pixel coordinates
3D bounding boxDefines the box surrounding the object in space represented as eight 3D points.Eight 3D coordinates
MaskProvides the pixels which really belong to the object and those of the background.Binary mask
IDFixed ID for identifying an object over time.Integer
Tracking stateDefines if an object is currently tracked or lost.Ok, Off, Searching, Terminate
Action stateDefines if an object is currently idle or moving.Idle, Moving
PositionProvides the 3D position of the object according to the camera as a 3D vector (x,y,z).[x, y, z]
VelocityProvides the velocity of the object in space as a 3D vector (x,y,z).[vx, vy, vz]
DimensionsProvides the width, height and length of the object.[width, height, length]
Detection confidenceA lower confidence means the object might not be localized perfectly or that its label is uncertain.0 - 100
2D bounding boxDefines the box surrounding the object in the image represented as four 2D points.Four pixel coordinates
3D bounding boxDefines the box surrounding the object in space represented as eight 3D points.Eight 3D coordinates
MaskProvides the pixels which really belong to the object and those of the background.Binary mask
2D keypointA set of useful points representing the human body, expressed in 2D.a vector of [x,y]
KeypointA set of useful points representing the human body, expressed in 3D.a vector of [x, y, z]
2D head bounding boxbounds the head with four 2D points.Four pixel coordinates
3D head bounding boxbounds the head with eight 3D points.Eight 3D coordinates
Head position3D head centroid[x, y, z]
Keypoint confidencePer keypoint detection confidencea vector of float
Local position per jointlocal position of each keypointa vector of [x,y,z]
Local orientation per jointlocal rotation of each keypointa vector of [x,y,z,w]
Global root orientationglobal root orientation of the Body[x,y,z,w]

For more information on Body Tracking, see the Using the API page.