3D Object Detection Overview

Object detection is the ability to identify objects present in an image. Thanks to depth sensing and 3D information, the ZED camera can provide the 2D and 3D positions of the objects in the scene.

Important: At the moment, only a few object classes can be detected and tracked with the 3D Object Detection API using ZED cameras (except the ZED 1 camera). You can find the list of available object classes in our API Reference. For general object detection, use our PyTorch and TensorFlow integrations.

Since ZED SDK 3.6, a custom detector can be used with the API. The 2D detection is ingested and the 3D information such as position, 3D bounding box and more are computed. More information on the Custom Detector page.

How It Works #

The ZED SDK uses AI and neural networks to determine which objects are present in both the left and right images. The SDK then computes the 3D position of each object, as well as their bounding box, using data from the depth module. The objects can also be tracked within the environment over time, even if the camera is in motion, thanks to data from the positional tracking module.

3D Object Detection #

The ZED SDK detects all objects present in the images and computes their 3D position and velocity. The distance of the object from the camera is expressed in metric units (ex. meters) and calculated from the back of the left eye of the camera to the scene object.

The SDK also computes a 2D mask that indicates which pixels of the left image belong to the object. From there, the ZED can output the 2D bounding boxes of the objects and accurately compute the 3D bounding boxes with the help of the depth map.

The bounding boxes of the objects can be displayed over the image or the point cloud as depicted in the image above. For more information on how to display the bounding boxes and the object mask, please see our Object Detection sample.

3D Object Tracking #

If the positional tracking module is activated, the ZED SDK can track the objects within the environment. This means that the detected object will keep the same ID through the sequence, allowing to display of the objects’ path over time. The object detection sample includes a view of these paths, as depicted below:

Detection Outputs #

Each object is stored as a structure in the ZED SDK. This structure contains all the information regarding a detected object:

Object DataDescriptionOutput
IDFixed ID for identifying an object over time.Integer
LabelIdentifies the object type.Person, Vehicle
Tracking stateDefines if an object is currently tracked or lost.Ok, Off, Searching, Terminate
Action stateDefines if an object is currently idle or moving.Idle, Moving
PositionProvides the 3D position of the object according to the camera as a 3D vector (x,y,z).[x, y, z]
VelocityProvides the velocity of the object in space as a 3D vector (x,y,z).[vx, vy, vz]
DimensionsProvides the width, height and length of the object.[width, height, length]
Detection confidenceA lower confidence means the object might not be localized perfectly or that its label is uncertain.0 - 100
2D bounding boxDefines the box surrounding the object in the image represented as four 2D points.Four pixel coordinates
3D bounding boxDefines the box surrounding the object in space represented as eight 3D points.Eight 3D coordinates
MaskProvides the pixels which really belong to the object and those of the background.Binary mask

For more information on Object Detection, see the Using the API page.