Using the Object Detection API

Object Detection Configuration

To configure object detection, use ObjectDetectionParameters at initialization and ObjectDetectionRuntimeParameters to change specific parameters during use.

// Set initialization parameters
ObjectDetectionParameters detection_parameters;
detection_parameters.enable_tracking = true; // Objects will keep the same ID between frames
detection_parameters.enable_mask_output = true; // Will compute 2D masks

// Set runtime parameters
ObjectDetectionRuntimeParameters detection_parameters_rt;
detection_parameters_rt.detection_confidence_threshold = 25;
# Set initialization parameters
detection_parameters = sl.ObjectDetectionParameters()
detection_parameters.enable_tracking = true

# Set runtime parameters
detection_parameters_rt = sl.ObjectDetectionRuntimeParameters()
detection_parameters_rt.detection_confidence_threshold = 25

If you want to track objects’ motion within their environment, you will first need to activate the positional tracking module. Then, set detection_parameters.enable_tracking to true.

if (detection_parameters.enable_tracking) {
    // Set positional tracking parameters
    PositionalTrackingParameters positional_tracking_parameters;
    positional_tracking_parameters.set_floor_as_origin = true;

    // Enable positional tracking
    zed.enablePositionalTracking(positional_tracking_parameters);
}
if detection_parameters.enable_tracking :
    # Set positional tracking parameters
    positional_tracking_parameters = sl.PositionalTrackingParameters()
    positional_tracking_parameters.set_floor_as_origin = true

    # Enable positional tracking
    zed.enable_positional_tracking(positional_tracking_parameters)

With these parameters configured, you can enable the object detection module like so:

// Enable object detection with initialization parameters
zed_error = zed.enableObjectDetection(detection_parameters);
if (zed_error != ERROR_CODE::SUCCESS) {
    cout << "enableObjectDetection: " << zed_error << "\nExit program.";
    zed.close();
    exit(-1);
}
# Enable object detection with initialization parameters
zed_error = zed.enable_object_detection(detection_parameters)
if zed_error != sl.ERROR_CODE.SUCCESS :
    print("enable_object_detection", zed_error, "\nExit program.")
    zed.close()
    exit(-1)

Warning ! The object detection module requires the motion sensors to estimate the gravity. Therefore only the ZED2 and ZED-M are compatible, and the sensors cannot be disabled to use the module.

Getting Object Data

To get the dectected objects in a scene, get an new image with grab(...) and extract the detected objects with retrieveObjects(). The objects’ 2D positions are related to the left image, while the 3D positions are wheter in the CAMERA or WORLD referential depending on RuntimeParameters.measure3D_reference_frame (given to the grab() function).

sl::Objects objects; // Structure containing all the detected objects
if (zed.grab() == ERROR_CODE::SUCCESS) {
  zed.retrieveObjects(objects, detection_parameters_rt); // Retrieve the detected objects
}
objects = sl.Objects() # Structure containing all the detected objects
if zed.grab() == sl.ERROR_CODE.SUCCESS :
  zed.retrieve_objects(objects, obj_runtime_param) # Retrieve the detected objects

The sl::Objects class stores all the information regarding the different objects present in the scene in it object_list attribute. Each individual object is stored as a sl::ObjectData with all information about it, such as bounding box, position, mask, etc. All objects from a given frame are stored in a vector within sl::Objects. sl::Objects also contains the timestamp of the detection, which can help connect the objects to the images.

You can iterate through the objects as follows:

for(auto object : objects.object_list)
  std::cout << object.id << " " << object.position << std::endl;
for object in objects.object_list:
  print("{} {}".format(object.id, object.position))

Each detected object can be accessed by using its ID as follows:

sl::ObjectData object;
objects.getObjectDataFromId(object, 0); // Get the object with ID = O
object = sl.ObjectData()
objects.get_object_data_from_id(object, 0); # Get the object with ID = O

Accessing Object Information

Once an sl::ObjectData is retreived from the object vector, you can access information such as its ID, position, velocity, label, and tracking_state:

unsigned int object_id = object.id // Get the object id
sl::float3 object_position = object.position // Get the object position
sl::float3 object_velocity = object.velocity // Get the object velocity
sl::OBJECT_TRACKING_STATE object_tracking_state = object.tracking_state // Get the tracking state of the object
if(object_tracking_state == sl::OBJECT_TRACK_STATE::OK){
    cout << "Object " << object_id << " is tracked" << endl;
}
object_id = object.id # Get the object id
object_position = object.position # Get the object position
object_velocity = object.velocity # Get the object velocity
object_tracking_state = object.tracking_state # Get the tracking state of the object
if object_tracking_state == sl.OBJECT_TRACK_STATE.OK :
    print("Object {0} is tracked\n".format(object_id))

You can also access the confidence of the detection for each object. This confidence depicts the probability of a detected object to really be present in the scene. Therefore, this confidence can be used to post-filter the detected objects. For example, you can ignore objects with a confidence less than 10%:

for(auto object : objects.object_list){
  if(object.confidence < 0.1f)
    continue;
  // Work with other objects
}
for object in objects.object_list:
  if object.confidence < 0.1 :
    continue
  # Work with other objects

Getting 3D Bounding Boxes

Each detected object contains two bounding boxes: a 2D bounding box and a 3D bounding box. The 2D bounding box is defined in the image frame while the 3D bounding box is provided with the depth information.

The 2D bounding box is represented as four 2D points starting from the top left corner of the object. The 3D bounding box is represented by eight 3D points starting from the top left front corner, as follows:

The 2D and 3D bounding boxes are accessible in sl::ObjectData:

vector<sl::uint2> object_2Dbbox = object.bounding_box_2d; // Get the 2D bounding box of the object
vector<sl::float3> object_3Dbbox = object.bounding_box; // Get the 3D bounding box of the object
object_2Dbbox = object.bounding_box_2d; # Get the 2D bounding box of the object
object_3Dbbox = object.bounding_box; # Get the 3D Bounding Box of the object

Getting the Object Mask

Each object can also be represented by its mask. The mask includes the pixels within the 2D bounding box that belong to the object. Pixels from the object itself are set to 255 while the pixels of the background are set to 0. You can access the mask of an object with sl::Mat object_mask = object.mask;.

Code Example

For code examples, check out the Tutorial and Sample on GitHub.