Body Tracking Overview

The body tracking module focuses on a person’s bones detection and tracking. A detected bone is represented by its two endpoints also called keypoints. The ZED camera can provide 2D and 3D information on each detected keypoints. Furthermore, it produces local rotation between neighbor bones.

How It Works #

The overall process is very similar to the ZED SDK Object detection module. They share some information in outputs like the 3D position and 3D velocity of each person. The body tracking module also uses a neural network for keypoints detection and then calls depth and positional tracking of the ZED SDK module to get the final 3D position of each keypoint. The ZED SDK supports multiple body formats :

Body18
Body34
Body38

The BODY_18 body format contains 18 keypoints following the COCO18 skeleton representation

Each keypoint is indexed by an integer from 0 to 17 :

keypoint index	keypoint name	keypoint index	keypoint name
0	NOSE	9	RIGHT_KNEE
1	NECK	10	RIGHT_ANKLE
2	RIGHT_SHOULDER	11	LEFT_HIP
3	RIGHT_ELBOW	12	LEFT_KNEE
4	RIGHT_WRIST	13	LEFT_ANKLE
5	LEFT_SHOULDER	14	RIGHT_EYE
6	LEFT_ELBOW	15	LEFT_EYE
7	LEFT_WRIST	16	RIGHT_EAR
8	RIGHT_HIP	17	LEFT_EAR

The BODY_34 body format contains 34 keypoints following this configuration :

Each keypoint is indexed by an integer from 0 to 33 :

keypoint index	keypoint name	keypoint index	keypoint name	keypoint index	keypoint name	keypoint index	keypoint name
0	PELVIS	9	LEFT_HANDTIP	18	LEFT_HIP	27	NOSE
1	NAVAL_SPINE	10	LEFT_THUMB	19	LEFT_KNEE	28	LEFT_EYE
2	CHEST_SPINE	11	RIGHT_CLAVICLE	20	LEFT_ANKLE	29	LEFT_EAR
3	NECK	12	RIGHT_SHOULDER	21	LEFT_FOOT	30	RIGHT_EYE
4	LEFT_CLAVICLE	13	RIGHT_ELBOW	22	RIGHT_HIP	31	RIGHT_EAR
5	LEFT_SHOULDER	14	RIGHT_WRIST	23	RIGHT_KNEE	32	LEFT_HEEL
6	LEFT_ELBOW	15	RIGHT_HAND	24	RIGHT_ANKLE	33	RIGHT_HEEL
7	LEFT_WRIST	16	RIGHT_HANDTIP	25	RIGHT_FOOT
8	LEFT_HAND	17	RIGHT_THUMB	26	HEAD

The BODY_38 body format contains 38 keypoints following this configuration :

Each keypoint is indexed by an integer from 0 to 37 :

keypoint index	keypoint name	keypoint index	keypoint name	keypoint index	keypoint name	keypoint index	keypoint name
0	PELVIS	10	LEFT_CLAVICLE	20	LEFT_KNEE	30	LEFT_HAND_THUMB_4
1	SPINE_1	11	RIGHT_CLAVICLE	21	RIGHT_KNEE	31	RIGHT_HAND_THUMB_4
2	SPINE_2	12	LEFT_SHOULDER	22	LEFT_ANKLE	32	LEFT_HAND_INDEX_1
3	SPINE_3	13	RIGHT_SHOULDER	23	RIGHT_ANKLE	33	RIGHT_HAND_INDEX_1
4	NECK	14	LEFT_ELBOW	24	LEFT_BIG_TOE	34	LEFT_HAND_MIDDLE_4
5	NOSE	15	RIGHT_ELBOW	25	RIGHT_BIG_TOE	35	RIGHT_HAND_MIDDLE_4
6	LEFT_EYE	16	LEFT_WRIST	26	LEFT_SMALL_TOE	36	LEFT_HAND_PINKY_1
7	RIGHT_EYE	17	RIGHT_WRIST	27	RIGHT_SMALL_TOE	37	RIGHT_HAND_PINKY_1
8	LEFT_EAR	18	LEFT_HIP	28	LEFT_HEEL
9	RIGHT_EAR	19	RIGHT_HIP	29	RIGHT_HEEL

The ZED SDK can output 3 levels of information: raw 2D/3D body detection, 3D body tracking and 3D body fitting.

2D/3D Body detection #

The ZED SDK first uses the ZED camera image to infer all 2D bones and keypoints using neural networks. Then the SDK depth module and positional tracking module are used together to extract the correct 3D position of each bone and keypoint.

3D body tracking #

If tracking is enabled, the ZED SDK will assign an identity to each detected body over time. At the same time, by filtering the raw body detection, it will output a more stable 3D body estimation.

3D body fitting #

Moreover, a user can enable fitting to unlock even more information about each identity. The fitting process takes the history of each tracked person to deduce all missing keypoints thanks to the human kinematic’s constraint used by the body tracking module. It is also able to extract local rotation between a pair of neighbor bones by solving the inverse kinematic problem. These data will be compatible with some known software for avataring for example. Here is an example where BODY_FORMAT::BODY_34 were used to animate an avatar in Unreal.

Detection Outputs #

Each detected person is stored as a structure in the ZED SDK called sl.BodyData.

Body Data	Description	Output
ID	Fixed ID for identifying an object over time.	`Integer`
Tracking state	Defines if an object is currently tracked or lost.	`Ok`, `Off`, `Searching`, `Terminate`
Action state	Defines if an object is currently idle or moving.	`Idle`, `Moving`
Position	Provides the 3D position of the object according to the camera as a 3D vector (x,y,z).	`[x, y, z]`
Velocity	Provides the velocity of the object in space as a 3D vector (x,y,z).	`[v_x, v_y, v_z]`
Dimensions	Provides the width, height and length of the object.	`[width, height, length]`
Detection confidence	A lower confidence means the object might not be localized perfectly or that its label is uncertain.	`0 - 100`
2D bounding box	Defines the box surrounding the object in the image represented as four 2D points.	`Four pixel coordinates`
3D bounding box	Defines the box surrounding the object in space represented as eight 3D points.	`Eight 3D coordinates`
Mask	Provides the pixels which really belong to the object and those of the background.	`Binary mask`
ID	Fixed ID for identifying an object over time.	`Integer`
Tracking state	Defines if an object is currently tracked or lost.	`Ok`, `Off`, `Searching`, `Terminate`
Action state	Defines if an object is currently idle or moving.	`Idle`, `Moving`
Position	Provides the 3D position of the object according to the camera as a 3D vector (x,y,z).	`[x, y, z]`
Velocity	Provides the velocity of the object in space as a 3D vector (x,y,z).	`[v_x, v_y, v_z]`
Dimensions	Provides the width, height and length of the object.	`[width, height, length]`
Detection confidence	A lower confidence means the object might not be localized perfectly or that its label is uncertain.	`0 - 100`
2D bounding box	Defines the box surrounding the object in the image represented as four 2D points.	`Four pixel coordinates`
3D bounding box	Defines the box surrounding the object in space represented as eight 3D points.	`Eight 3D coordinates`
Mask	Provides the pixels which really belong to the object and those of the background.	`Binary mask`
2D keypoint	A set of useful points representing the human body, expressed in 2D.	a vector of `[x,y]`
Keypoint	A set of useful points representing the human body, expressed in 3D.	a vector of `[x, y, z]`
2D head bounding box	bounds the head with four 2D points.	`Four pixel coordinates`
3D head bounding box	bounds the head with eight 3D points.	`Eight 3D coordinates`
Head position	3D head centroid	`[x, y, z]`
Keypoint confidence	Per keypoint detection confidence	`a vector of float`
Local position per joint	local position of each keypoint	`a vector of [x,y,z]`
Local orientation per joint	local rotation of each keypoint	`a vector of [x,y,z,w]`
Global root orientation	global root orientation of the Body	`[x,y,z,w]`

For more information on Body Tracking, see the Using the API page.