Positional Tracking Modes

The Stereolabs Positional Tracking module uses visual-inertial SLAM (VSLAM) to deliver precise, real-time estimates of the camera’s 3D position and orientation. By fusing stereo vision with IMU data, it builds and refines a 3D map of the environment while simultaneously tracking motion within it. This approach ensures accurate, drift-minimized localization—even in dynamic or visually challenging conditions—making it suitable for demanding robotics, AR/VR, and autonomous navigation applications.

The ZED SDK includes multiple generations of positional tracking algorithms — GEN_1 and GEN_3 1— each designed to meet different performance and accuracy requirements.

The positional tracking module supports two operating localization strategies:

  • VIO (Visual-Inertial Odometry) — which performs pure localization without using a prior map.

  • Relocalization — which performs localization within an area map, allowing the system to recognize known environments and maintain consistent tracking across sessions.

These modes and localization strategies allow you to choose the best balance between computational load, precision, and robustness for your specific use case.

1 The GEN_2 positional tracking generation has been deprecated and is no longer supported in recent ZED SDK releases.

GEN_1 mode #

GEN_1 implements a dense VSLAM approach that leverages depth data from the stereo camera to estimate motion. It generates a dense representation directly from depth information rather than sparse keypoints, enabling more stable and drift-resistant tracking in low-texture or feature-poor areas.

The fusion of stereo vision and IMU data improves robustness against motion blur and rapid movements. Originally designed for AR/VR applications, GEN_1 delivers smooth and consistent positional tracking, especially during fast or complex movements.

📌 Note: GEN_1 is optimized primarily for VIO (Visual-Inertial Odometry) mode. While it excels at real-time localization and smooth tracking, its dense VSLAM architecture was not designed with loop closure or area map relocalization in mind. For applications requiring persistent mapping or multi-session relocalization, GEN_3 is the recommended choice.

GEN_1 Load Performances #

Depth ModeMax FPSCPU (%)GPU (%)
Neural LightDepth Compute FPS1377
NeuralDepth Compute FPS880
Neural PlusDepth Compute FPS494

📌 Note: performance obtained with ZED SDK v5.1.1, ZED X Driver v1.3.2, and ZED X camera using the positional tracking sample available on GitHub. Because GEN_1 is depth dependant, performances vary based on the chosen depth mode. More information on depth modes is available on this page.

GEN_1 Accuracy Performances #

Depth ModeMean APE [m]Max APE [m]Specific Scenario Dataset
Neural Light / Neural / Neural Plus0.8 /0.8 / 0.621.8/ 1.78 / 1.47Indoor warehouse environments with reflective lights
Neural Light / Neural / Neural Plus3.38 /3.9 / 2.457.76/ 8.27 / 5.7Structured outdoor sceneries during the day

📌 Note: performance obtained with ZED SDK v5.1.1, ZED X Driver v1.3.2, and ZED X camera using the positional tracking sample available on GitHub. Each value represents the average result from a series of real-world tests collected under a specific scenario dataset (e.g., indoor, outdoor). The average test sequence spans approximately 400 m.

GEN_3 mode #

GEN_3 introduces a scalable, feature-based VSLAM pipeline built for robustness and precision. By extracting and tracking high-quality visual features—rather than relying on GEN_1’s dense mapping—it maintains an accurate, lightweight map with strong loop closuren and global optimization capabilities. This minimizes drift, improves long-term consistency, and delivers a fast and reliable relocalization.

Combined with visual-inertial fusion, GEN_3 adapts seamlessly to large, dynamic, or revisited environments—ideal for advanced robotics and autonomous navigation applications.

📌 Note: GEN_3 is recommended for both the VIO and relocalization localization strategies.

GEN_3 Load Performances #

Max FPSCPU (%)GPU (%)
up to 80 FPS1344

📌 Note: performance obtained with ZED SDK v5.1.1, ZED X Driver v1.3.2, and ZED X camera using the positional tracking sample available on GitHub.

GEN_3 Accuracy Performances #

Mean APE [m]Max APE [m]Specific Scenario Dataset
0.561.2Indoor warehouse environments with reflective lights
0.290.58Structured outdoor sceneries during the day

📌 Note: performance obtained with ZED SDK v5.1.1, ZED X Driver v1.3.2, and ZED X camera using the positional tracking sample available on GitHub. Each value represents the average result from a series of real-world tests collected under a specific scenario dataset (e.g., indoor, outdoor). The average test sequence spans approximately 400 m.

Positional Tracking Modes Comparison #

Positional Tracking GenerationBenefitsLimitations
GEN_1
  • Well-suited for environments with limited visual information (low texture, low light, repetitive structures).
  • Reliable for open-field outdoor robotics or inspection tasks.
  • Optimized for AR/VR applications.
  • Depends heavily on depth information, resulting in higher computational requirements.
  • Provides fewer opportunities for loop closure, making it less effective in feature-rich indoor environments.
  • Only suited for VIO localization strategy, not recommended for relocalization.
  • GEN_3
  • Designed for feature-rich environments such as indoor facilities, warehouses, offices, and structured outdoor scenes.
  • Offers advanced loop closure and relocalization, making it ideal for area mapping and long-term navigation.
  • Provides a lightweight and efficient tracking pipeline that scales well to large and complex spaces.
  • Well suited for both VIO and Relocalization localization strategies, without the need for depth.
  • May be less robust in severely degraded visual conditions where very few features are available.