Using the Object Detection API

Object Detection Configuration

To configure the object detection module, use ObjectDetectionParameters at initialization and ObjectDetectionRuntimeParameters to change specific parameters during use.

1 // Set initialization parameters
2 ObjectDetectionParameters detection_parameters;
3 detection_parameters.enable_tracking = true; // Objects will keep the same ID between frames
4 detection_parameters.enable_segmentation = true; // Outputs 2D masks over detected objects
5 
6 // Set runtime parameters
7 ObjectDetectionRuntimeParameters detection_parameters_rt;
8 detection_parameters_rt.detection_confidence_threshold = 25;

Various Object Box detection models are available in the ZED SDK:

the general purpose object detection including OBJECT_DETECTION_MODEL::MULTI_CLASS_BOX, OBJECT_DETECTION_MODEL::MULTI_CLASS_BOX_MEDIUM and OBJECT_DETECTION_MODEL::MULTI_CLASS_BOX_ACCURATE. You can choose one of them depending on desired performance/accuracy. These models are able to detect multiple object classes OBJECT_CLASS.
the head detection OBJECT_DETECTION_MODEL::PERSON_HEAD_BOX. It is specialized in person head detection and tracking. It may be beneficial for applications in a crowded scene where persons in the background are merely detected by the general-purpose person detection model. We have separated this model from the general-purpose object detection model and have brought some special optimization and improvements to increase detection and tracking accuracies. It only detects a single class OBJECT_CLASS::PERSON with subclass OBJECT_SUBCLASS::PERSON_HEAD.

You can use detection_parameters.detection_model to set the detection model:

1 // choose a detection model
2 detection_parameters.detection_model = OBJECT_DETECTION_MODEL::MULTI_CLASS_BOX;

If you want to track objects’ motion within their environment, you will first need to activate the positional tracking module. Then, set detection_parameters.enable_tracking to true.

1 if (detection_parameters.enable_tracking) {
2     // Set positional tracking parameters
3     PositionalTrackingParameters positional_tracking_parameters;
4     // Enable positional tracking
5     zed.enablePositionalTracking(positional_tracking_parameters);
6 }

With these parameters configured, you can enable the object detection module:

1 // Enable object detection with initialization parameters
2 zed_error = zed.enableObjectDetection(detection_parameters);
3 if (zed_error != ERROR_CODE::SUCCESS) {
4     cout << "enableObjectDetection: " << zed_error << "\nExit program.";
5     zed.close();
6     exit(-1);
7 }

Object Detection has been optimized for ZED Mini, ZED 2i, ZED X, ZED X Mini, and ZED X Nano and uses the camera motion sensors for improved reliability. Therefore the Object Detection module requires a ZED Mini, ZED 2i, ZED X, ZED X Mini, and ZED X Nano and Inertial sensors cannot be disabled when using the module.

Getting Object Data

To get the detected objects in a scene, get a new image with grab(...) and extract the detected objects with retrieveObjects(). The objects’ 2D positions are relative to the left image, while the 3D positions are either in the CAMERA or WORLD reference frame depending on RuntimeParameters.measure3D_reference_frame (given to the grab() function).

1 sl::Objects objects; // Structure containing all the detected objects
2 if (zed.grab() == ERROR_CODE::SUCCESS) {
3   zed.retrieveObjects(objects, detection_parameters_rt); // Retrieve the detected objects
4 }

The sl::Objects class stores all the information regarding the different objects present in the scene in the object_list attribute. Each individual object is stored as a sl::ObjectData with all information about it, such as bounding box, position, mask, etc. All objects from a given frame are stored in a vector within sl::Objects. sl::Objects also contains the timestamp of the detection, which can help connect the objects to the images.

You can iterate through the objects as follows:

1 for (auto object : objects.object_list)
2   std::cout << object.id << " " << object.position << std::endl;

Each detected object can be accessed by using its ID as follows:

1 sl::ObjectData object;
2 objects.getObjectDataFromId(object, 0); // Get the object with ID = 0

Accessing Object Information

Once an sl::ObjectData is retrieved from the object vector, you can access information such as its ID, position, velocity, label, and tracking_state:

1 unsigned int object_id = object.id; // Get the object id
2 sl::float3 object_position = object.position; // Get the object position
3 sl::float3 object_velocity = object.velocity; // Get the object velocity
4 sl::OBJECT_TRACKING_STATE object_tracking_state = object.tracking_state; // Get the tracking state of the object
5 if (object_tracking_state == sl::OBJECT_TRACKING_STATE::OK) {
6     cout << "Object " << object_id << " is tracked" << endl;
7 }

You can also access the confidence of the detection for each object. This confidence depicts the probability of a detected object to really be present in the scene. Therefore, this confidence can be used to post-filter the detected objects. For example, you can ignore objects with confidence less than 10%:

1 for (auto object : objects.object_list) {
2   if (object.confidence < 0.1f)
3     continue;
4   // Work with other objects
5 }

Getting 3D Bounding Boxes

Each detected object contains two bounding boxes: a 2D bounding box and a 3D bounding box. The 2D bounding box is defined in the image frame while the 3D bounding box is provided with the depth information.

The 2D bounding box is represented as four 2D points starting from the top left corner of the object. The 3D bounding box is represented by eight 3D points starting from the top left front corner, as follows:

The 2D and 3D bounding boxes are accessible in sl::ObjectData:

1 vector<sl::uint2> object_2Dbbox = object.bounding_box_2d; // Get the 2D bounding box of the object
2 vector<sl::float3> object_3Dbbox = object.bounding_box; // Get the 3D bounding box of the object

Getting the Object Mask

Each object can also be represented by its mask. The mask includes the pixels within the 2D bounding box that belong to the object. Pixels from the object itself are set to 255 while the pixels of the background are set to 0. You can access the mask of an object with sl::Mat object_mask = object.mask;.

Code Example

For code examples, check out the Tutorial and Sample on GitHub.