Body Tracking Overview

The body tracking module focuses on a person’s bones detection and tracking. A detected bone is represented by its two endpoints also called keypoints. The ZED camera can provide 2D and 3D information on each detected keypoints. Furthermore, it produces local rotation between neighbor bones.

How It Works #

The overall process is very similar to the ZED SDK Object detection module. They share some information in outputs like the 3D position and 3D velocity of each person. The body tracking module also uses a neural network for keypoints detection and then calls depth and positional tracking of the ZED SDK module to get the final 3D position of each keypoint. The ZED SDK supports multiple body formats :

The BODY_18 body format contains 18 keypoints following the COCO18 skeleton representation

 

Each keypoint is indexed by an integer from 0 to 17 :

keypoint indexkeypoint namekeypoint indexkeypoint name
0
NOSE
9
RIGHT_KNEE
1
NECK
10
RIGHT_ANKLE
2
RIGHT_SHOULDER
11
LEFT_HIP
3
RIGHT_ELBOW
12
LEFT_KNEE
4
RIGHT_WRIST
13
LEFT_ANKLE
5
LEFT_SHOULDER
14
RIGHT_EYE
6
LEFT_ELBOW
15
LEFT_EYE
7
LEFT_WRIST
16
RIGHT_EAR
8
RIGHT_HIP
17
LEFT_EAR

The BODY_34 body format contains 34 keypoints following this configuration :

 

  • Each keypoint is indexed by an integer from 0 to 33 :
keypoint indexkeypoint namekeypoint indexkeypoint namekeypoint indexkeypoint namekeypoint indexkeypoint name
0
PELVIS
9
LEFT_HANDTIP
18
LEFT_HIP
27
NOSE
1
NAVAL_SPINE
10
LEFT_THUMB
19
LEFT_KNEE
28
LEFT_EYE
2
CHEST_SPINE
11
RIGHT_CLAVICLE
20
LEFT_ANKLE
29
LEFT_EAR
3
NECK
12
RIGHT_SHOULDER
21
LEFT_FOOT
30
RIGHT_EYE
4
LEFT_CLAVICLE
13
RIGHT_ELBOW
22
RIGHT_HIP
31
RIGHT_EAR
5
LEFT_SHOULDER
14
RIGHT_WRIST
23
RIGHT_KNEE
32
LEFT_HEEL
6
LEFT_ELBOW
15
RIGHT_HAND
24
RIGHT_ANKLE
33
RIGHT_HEEL
7
LEFT_WRIST
16
RIGHT_HANDTIP
25
RIGHT_FOOT
8
LEFT_HAND
17
RIGHT_THUMB
26
HEAD

The BODY_38 body format contains 38 keypoints following this configuration :

 

  • Each keypoint is indexed by an integer from 0 to 37 :
keypoint indexkeypoint namekeypoint indexkeypoint namekeypoint indexkeypoint namekeypoint indexkeypoint name
0
PELVIS
10
LEFT_CLAVICLE
20
LEFT_KNEE
30
LEFT_HAND_THUMB_4
1
SPINE_1
11
RIGHT_CLAVICLE
21
RIGHT_KNEE
31
RIGHT_HAND_THUMB_4
2
SPINE_2
12
LEFT_SHOULDER
22
LEFT_ANKLE
32
LEFT_HAND_INDEX_1
3
SPINE_3
13
RIGHT_SHOULDER
23
RIGHT_ANKLE
33
RIGHT_HAND_INDEX_1
4
NECK
14
LEFT_ELBOW
24
LEFT_BIG_TOE
34
LEFT_HAND_MIDDLE_4
5
NOSE
15
RIGHT_ELBOW
25
RIGHT_BIG_TOE
35
RIGHT_HAND_MIDDLE_4
6
LEFT_EYE
16
LEFT_WRIST
26
LEFT_SMALL_TOE
36
LEFT_HAND_PINKY_1
7
RIGHT_EYE
17
RIGHT_WRIST
27
RIGHT_SMALL_TOE
37
RIGHT_HAND_PINKY_1
8
LEFT_EAR
18
LEFT_HIP
28
LEFT_HEEL
9
RIGHT_EAR
19
RIGHT_HIP
29
RIGHT_HEEL

 

The ZED SDK can output 3 levels of information: raw 2D/3D body detection, 3D body tracking and 3D body fitting.

2D/3D Body detection #

The ZED SDK first uses the ZED camera image to infer all 2D bones and keypoints using neural networks. Then the SDK depth module and positional tracking module are used together to extract the correct 3D position of each bone and keypoint.

3D body tracking #

If tracking is enabled, the ZED SDK will assign an identity to each detected body over time. At the same time, by filtering the raw body detection, it will output a more stable 3D body estimation.

3D body fitting #

Moreover, a user can enable fitting to unlock even more information about each identity. The fitting process takes the history of each tracked person to deduce all missing keypoints thanks to the human kinematic’s constraint used by the body tracking module. It is also able to extract local rotation between a pair of neighbor bones by solving the inverse kinematic problem. These data will be compatible with some known software for avataring for example. Here is an example where BODY_FORMAT::BODY_34 were used to animate an avatar in Unreal.

Detection Outputs #

Each detected person is stored as a structure in the ZED SDK called sl.BodyData.

Body DataDescriptionOutput
IDFixed ID for identifying an object over time.Integer
Tracking stateDefines if an object is currently tracked or lost.Ok, Off, Searching, Terminate
Action stateDefines if an object is currently idle or moving.Idle, Moving
PositionProvides the 3D position of the object according to the camera as a 3D vector (x,y,z).[x, y, z]
VelocityProvides the velocity of the object in space as a 3D vector (x,y,z).[vx, vy, vz]
DimensionsProvides the width, height and length of the object.[width, height, length]
Detection confidenceA lower confidence means the object might not be localized perfectly or that its label is uncertain.0 - 100
2D bounding boxDefines the box surrounding the object in the image represented as four 2D points.Four pixel coordinates
3D bounding boxDefines the box surrounding the object in space represented as eight 3D points.Eight 3D coordinates
MaskProvides the pixels which really belong to the object and those of the background.Binary mask
IDFixed ID for identifying an object over time.Integer
Tracking stateDefines if an object is currently tracked or lost.Ok, Off, Searching, Terminate
Action stateDefines if an object is currently idle or moving.Idle, Moving
PositionProvides the 3D position of the object according to the camera as a 3D vector (x,y,z).[x, y, z]
VelocityProvides the velocity of the object in space as a 3D vector (x,y,z).[vx, vy, vz]
DimensionsProvides the width, height and length of the object.[width, height, length]
Detection confidenceA lower confidence means the object might not be localized perfectly or that its label is uncertain.0 - 100
2D bounding boxDefines the box surrounding the object in the image represented as four 2D points.Four pixel coordinates
3D bounding boxDefines the box surrounding the object in space represented as eight 3D points.Eight 3D coordinates
MaskProvides the pixels which really belong to the object and those of the background.Binary mask
2D keypointA set of useful points representing the human body, expressed in 2D.a vector of [x,y]
KeypointA set of useful points representing the human body, expressed in 3D.a vector of [x, y, z]
2D head bounding boxbounds the head with four 2D points.Four pixel coordinates
3D head bounding boxbounds the head with eight 3D points.Eight 3D coordinates
Head position3D head centroid[x, y, z]
Keypoint confidencePer keypoint detection confidencea vector of float
Local position per jointlocal position of each keypointa vector of [x,y,z]
Local orientation per jointlocal rotation of each keypointa vector of [x,y,z,w]
Global root orientationglobal root orientation of the Body[x,y,z,w]

For more information on Body Tracking, see the Using the API page.