3D Object Detection Overview

Object detection is the ability to identify objects present in an image. Thanks to depth sensing and 3D information, the ZED camera is able to provide the 2D and 3D position of the objects in the scene.

Important: At the moment, only Persons and Vehicles can be detected and tracked with the 3D Object Detection API using ZED 2 cameras. For general object detection, use our PyTorch and TensorFlow integrations.

How It Works

The ZED SDK uses AI and neural networks to determine which objects are present in both the left and right images. The SDK then computes the 3D position of each object, as well as their bounding box, using data from the depth module. The objects can also be tracked within the environment over time, even if the camera is in motion, thanks to data from the positional tracking module.

3D Object Detection

The ZED SDK detects all objects present in the images and computes their 3D position and velocity. The distance of the object from the camera is expressed in metric units (ex. meters) and calculated from the back of the left eye of the camera to the scene object.

The SDK also computes a 2D mask that indicates which pixels of the left image belongs to the object. From there, the ZED can output the 2D bounding boxes of the objects and accurately compute the 3D bounding boxes with the help of the depth map.

The bounding boxes of the objects can be displayed over the image or the point cloud as depicted in the image above. For more information on how to display the bounding boxes and the object mask, please see our Object Detection sample.

3D Object Tracking

If the positional tracking module is activated, the ZED SDK can track the objects within the environment. This means that the detected object will keep the same ID through the sequence, allowing to display the objects path over time. The object detection sample includes a view of these paths, as depicted below:

Detection Outputs

Each object is stored as a structure in the ZED SDK. This structure contains all the information regarding a detected object:

Object Data Description Output
ID Fixed ID for identifying an object over time. Integer
Label Identifies the object type. Person, Vehicle
Tracking state Defines if an object is currently tracked or lost. Ok, Off, Searching, Terminate
Action state Defines if an object is currently idle or moving. Idle, Moving
Position Provides the 3D position of the object according to the camera as a 3D vector (x,y,z). [x, y, z]
Velocity Provides the velocity of the object in space as a 3D vector (x,y,z). [vx, vy, vz]
Dimensions Provides the width, height and length of the object. [width, height, length]
Detection confidence A lower confidence means the object might not be localized perfectly or that its label is uncertain. 0 - 100
2D bounding box Defines the box surrounding the object in the image represented as four 2D points. Four pixel coordinates
3D bounding box Defines the box surrounding the object in space represented as eight 3D points. Eight 3D coordinates
Mask Provides the pixels which really belong to the object and those of the background. Binary mask

For more information on Object Detection, see the Using the API page.