Positional Tracking Modes

The StereoLabs Positional Tracking module uses visual-inertial SLAM (VSLAM) to deliver precise, real-time estimates of the camera’s 3D position and orientation. By fusing stereo vision with IMU data, it builds and refines a 3D map of the environment while simultaneously tracking motion within it. This approach ensures accurate, drift-minimized localization—even in dynamic or visually challenging conditions—making it suitable for demanding robotics, AR/VR, and autonomous navigation applications.

The ZED SDK includes multiple generations of positional tracking algorithms — GEN_1 and GEN_3 ¹ — each designed to meet different performance and accuracy requirements.

The positional tracking module supports two operating localization strategies:

VIO (Visual-Inertial Odometry) — which performs pure localization without using a prior map.
Relocalization — which performs localization within an area map, allowing the system to recognize known environments and maintain consistent tracking across sessions.

These modes and localization strategies allow you to choose the best balance between computational load, precision, and robustness for your specific use case.

¹ The GEN_2 positional tracking generation has been deprecated and will be removed in a future release.

`GEN_1` mode

GEN_1 implements a dense VSLAM approach that leverages depth data from the stereo camera to estimate motion. It generates a dense representation directly from depth information rather than sparse keypoints, enabling more stable and drift-resistant tracking in low-texture or feature-poor areas.

The fusion of stereo vision and IMU data improves robustness against motion blur and rapid movements. Originally designed for AR/VR applications, GEN_1 delivers smooth and consistent positional tracking, especially during fast or complex movements.

GEN_1 is optimized primarily for VIO (Visual-Inertial Odometry) mode. While it excels at real-time localization and smooth tracking, its dense VSLAM architecture was not designed with loop closure or area map relocalization in mind. For applications requiring persistent mapping or multi-session relocalization, GEN_3 is the recommended choice.

`GEN_1` Load Performances

Depth Mode	Max FPS	CPU (%)	GPU (%)
Neural Light	Depth Compute FPS	13	77
Neural	Depth Compute FPS	8	80
Neural Plus	Depth Compute FPS	4	94

Performance obtained with ZED SDK v5.1.1, ZED X Driver v1.3.2, and ZED X camera using the positional tracking sample available on GitHub. Because GEN_1 is depth-dependent, performance varies based on the chosen depth mode. More information on depth modes is available on this page.

`GEN_1` Accuracy Performances

Depth Mode	Mean APE [m]	Max APE [m]	Specific Scenario Dataset
Neural Light / Neural / Neural Plus	0.8 / 0.8 / 0.62	1.8 / 1.78 / 1.47	Indoor warehouse environments with reflective lights
Neural Light / Neural / Neural Plus	3.38 / 3.9 / 2.45	7.76 / 8.27 / 5.7	Structured outdoor sceneries during the day

Performance obtained with ZED SDK v5.1.1, ZED X Driver v1.3.2, and ZED X camera using the positional tracking sample available on GitHub. Each value represents the average result from a series of real-world tests collected under a specific scenario dataset (e.g., indoor, outdoor). The average test sequence spans approximately 400 m.

`GEN_3` mode

GEN_3 introduces a scalable, feature-based VSLAM pipeline built for robustness and precision. By extracting and tracking high-quality visual features—rather than relying on GEN_1’s dense mapping—it maintains an accurate, lightweight map with strong loop closure and global optimization capabilities. This minimizes drift, improves long-term consistency, and delivers fast and reliable relocalization.

Combined with visual-inertial fusion, GEN_3 adapts seamlessly to large, dynamic, or revisited environments—ideal for advanced robotics and autonomous navigation applications.

GEN_3 is recommended for both the VIO and relocalization localization strategies.

`GEN_3` Load Performances

Platform	Max FPS	CPU (%)	GPU (%)
Jetson™ Orin NX 16	up to 80 FPS	13	44

Performance obtained with ZED SDK v5.1.1, ZED X Driver v1.3.2, and ZED X camera using the positional tracking sample available on GitHub.

`GEN_3` Accuracy Performances

Mode	Mean APE [m]	Max APE [m]	Specific Scenario Dataset
VIO	0.56	1.2	Indoor warehouse environments with reflective lights
VIO	0.29	0.58	Structured outdoor sceneries during the day