Positional Tracking Modes
The Stereolabs Positional Tracking module uses visual-inertial SLAM (VSLAM) to deliver precise, real-time estimates of the camera’s 3D position and orientation. By fusing stereo vision with IMU data, it builds and refines a 3D map of the environment while simultaneously tracking motion within it. This approach ensures accurate, drift-minimized localization—even in dynamic or visually challenging conditions—making it suitable for demanding robotics, AR/VR, and autonomous navigation applications.
The ZED SDK includes multiple generations of positional tracking algorithms — GEN_1 and GEN_3 1— each designed to meet different performance and accuracy requirements.
The positional tracking module supports two operating localization strategies:
VIO (Visual-Inertial Odometry) — which performs pure localization without using a prior map.
Relocalization — which performs localization within an area map, allowing the system to recognize known environments and maintain consistent tracking across sessions.
These modes and localization strategies allow you to choose the best balance between computational load, precision, and robustness for your specific use case.
1 The GEN_2 positional tracking generation has been deprecated and is no longer supported in recent ZED SDK releases.
GEN_1 mode
#
GEN_1 implements a dense VSLAM approach that leverages depth data from the stereo camera to estimate motion. It generates a dense representation directly from depth information rather than sparse keypoints, enabling more stable and drift-resistant tracking in low-texture or feature-poor areas.
The fusion of stereo vision and IMU data improves robustness against motion blur and rapid movements. Originally designed for AR/VR applications, GEN_1 delivers smooth and consistent positional tracking, especially during fast or complex movements.
📌 Note:
GEN_1is optimized primarily for VIO (Visual-Inertial Odometry) mode. While it excels at real-time localization and smooth tracking, its dense VSLAM architecture was not designed with loop closure or area map relocalization in mind. For applications requiring persistent mapping or multi-session relocalization,GEN_3is the recommended choice.
GEN_1 Load Performances
#
| Depth Mode | Max FPS | CPU (%) | GPU (%) |
|---|---|---|---|
| Neural Light | Depth Compute FPS | 13 | 77 |
| Neural | Depth Compute FPS | 8 | 80 |
| Neural Plus | Depth Compute FPS | 4 | 94 |
📌 Note: performance obtained with ZED SDK v5.1.1, ZED X Driver v1.3.2, and ZED X camera using the positional tracking sample available on GitHub. Because
GEN_1is depth dependant, performances vary based on the chosen depth mode. More information on depth modes is available on this page.
GEN_1 Accuracy Performances
#
| Depth Mode | Mean APE [m] | Max APE [m] | Specific Scenario Dataset |
|---|---|---|---|
| Neural Light / Neural / Neural Plus | 0.8 /0.8 / 0.62 | 1.8/ 1.78 / 1.47 | Indoor warehouse environments with reflective lights |
| Neural Light / Neural / Neural Plus | 3.38 /3.9 / 2.45 | 7.76/ 8.27 / 5.7 | Structured outdoor sceneries during the day |
📌 Note: performance obtained with ZED SDK v5.1.1, ZED X Driver v1.3.2, and ZED X camera using the positional tracking sample available on GitHub. Each value represents the average result from a series of real-world tests collected under a specific scenario dataset (e.g., indoor, outdoor). The average test sequence spans approximately 400 m.
GEN_3 mode
#
GEN_3 introduces a scalable, feature-based VSLAM pipeline built for robustness and precision. By extracting and tracking high-quality visual features—rather than relying on GEN_1’s dense mapping—it maintains an accurate, lightweight map with strong loop closuren and global optimization capabilities. This minimizes drift, improves long-term consistency, and delivers a fast and reliable relocalization.
Combined with visual-inertial fusion, GEN_3 adapts seamlessly to large, dynamic, or revisited environments—ideal for advanced robotics and autonomous navigation applications.
📌 Note:
GEN_3is recommended for both the VIO and relocalization localization strategies.
GEN_3 Load Performances
#
| Max FPS | CPU (%) | GPU (%) |
|---|---|---|
| up to 80 FPS | 13 | 44 |
📌 Note: performance obtained with ZED SDK v5.1.1, ZED X Driver v1.3.2, and ZED X camera using the positional tracking sample available on GitHub.
GEN_3 Accuracy Performances
#
| Mean APE [m] | Max APE [m] | Specific Scenario Dataset |
|---|---|---|
| 0.56 | 1.2 | Indoor warehouse environments with reflective lights |
| 0.29 | 0.58 | Structured outdoor sceneries during the day |
📌 Note: performance obtained with ZED SDK v5.1.1, ZED X Driver v1.3.2, and ZED X camera using the positional tracking sample available on GitHub. Each value represents the average result from a series of real-world tests collected under a specific scenario dataset (e.g., indoor, outdoor). The average test sequence spans approximately 400 m.
Positional Tracking Modes Comparison #
| Positional Tracking Generation | Benefits | Limitations |
|---|---|---|
| GEN_1 | ||
| GEN_3 |