Object Detection

In this tutorial, you will place virtual boxes around real-world people detected by your ZED 2.

📌 Note: The original ZED camera do not support this feature.

What is 3D Object Detection? #

The ZED SDK Object Detection module uses a highly-optimized AI model to recognize specific objects (currently people and vehicles) within the video feed. Using depth, it goes a step further than similar algorithms to calculate the object’s 3D position in the world, not just within the 2D image.

In Unity, this lets us represent real objects/people in a 3D scene with virtual GameObjects. This tutorial will simply add boxes to encompass each person, but this feature could be extended to many applications, such as counting and visualizing occupancy in a zone or having virtual characters react to nearby people.

Preparing your ZED 2 #

The Object Detection module can use the floor plane position to make more assumptions, like keep boxes in contact with it. To do so, the floor plane should be visible in the image when starting the module as in this image:

Setting Up the Scene #

Basic Settings #

Create a new scene and delete the Main Camera
In the Project window, go to ZED -> Prefabs and drag ZED_Rig_Mono into the Hierarchy
Select the new ZED_Rig_Mono in the Hierarchy.
In the Motion Tracking section, make sure Estimate Initial Position is checked. This enables floor detection.
In the Inspector, set the Resolution to 1080p. This is not required but increases object detection accuracy
Set Depth Mode to ULTRA. Also not required.
If your camera is fixed and will not move, enable Tracking Is Static to prevent incorrect drift throughout the scene.

Object Detection Settings #

Scroll down to the Object Detection section.

You don’t need to change the default settings, but let’s take a moment to review them for future reference.

Object Detection Model: AI model for detection. You can choose between MULTI_CLASS_BOX (fast, but less accurate), MULTI_CLASS_BOX_MEDIUM (less fast, but more accurate), and MULTI_CLASS_BOX_ACCURATE (most accurate, but slower) to track all kinds of objects, or PERSON_HEAD_BOX or PERSON_HEAD_ACCURATE to track people’s heads.
Enable Segmentation: Enabling this allows scripts to access a 2D image that shows exactly which pixels in an object’s 2D bounding box belong to the object. Since we’re using the 3D bounding boxes only in this tutorial, leave this unchecked to save performance.
Max Detection Range Defines an upper depth range for detections (in Meters).
Allow Reduced Precision Inference Allows inference to run at a lower precision to improve runtime and memory usage.
Enable Tracking: If enabled, the ZED SDK will track objects between frames, providing more accurate data and giving access to more information, such as velocity.
Filtering Mode Defines the bounding box preprocessor used. More info here.
Prediction Timeout: Duration during which the SDK will predict the position of a lost object before its state is switched to SEARCHING.

The ZED SDK is able to detect multiple types of objects.

Confidence Threshold: Sets the minimum confidence value for a detected object to be published. Ex: If set to 40, the ZED SDK needs to be at least 40% confident that a detected object exists. Available for each class.
Class Filter: If enabled, the ZED SDK will detect this class. Available for each class.

Last is the “Start Object Detection” button. Normally, the Object Detection module doesn’t start when the ZED does because it causes a long delay. This button is one of two ways to start the module. The other is via script, which we’ll be doing here.

Adding Visuals #

Create a new empty GameObject in the Hierarchy and rename it “3D Object Visualizer”
Add the ZED3DObjectVisualizer component to it
In the Projects window, go to ZED -> Examples -> Object Detection -> Prefabs
Drag the 3D Bounding Box prefab into Bounding Box Prefab in the Inspector

📌 Note: “Start Object Detection Automatically” is checked by default. This will call the function in ZEDManager to initialize the Object Detection module as soon as the ZED 2 itself is ready.

Run the Scene #

Double-check to make sure your ZED can see the floor and run the scene. After a short initialization period, the app will pause for 10-20 seconds as the object detection module loads.

Once it does, step into the ZED’s view. You should see yourself enclosed in a box. Walk around, crouch, wave your arms, and the box will transform itself accordingly. Bring some friends into view to see it track multiple people at once.