Depth Sensing Overview

The ZED stereo camera reproduces the way human binocular vision works. Human eyes are horizontally separated by about 65 mm on average. Thus, each eye has a slightly different view of the world around. By comparing these two views, our brain can infer not only depth but also 3D motion in space.

Likewise, Stereolabs’ stereo cameras have two eyes separated by 6 to 12 cm allowing them to capture high-resolution 3D video of the scene and estimate depth and motion by comparing the displacement of pixels between the left and right images.

Depth Perception #

Depth perception is the ability to determine distances between objects and see the world in three dimensions. Up until now, depth sensors have been limited to perceiving depth at short range and indoors, restricting their application to gesture control and body tracking. Using stereo vision, the ZED is the first universal depth sensor:

  • Depth can be captured at longer ranges, up to 20m.
  • Frame rate of depth capture can be as high as 100 FPS.
  • Field of view is much larger, up to 110° (H) x 70° (V).
  • The camera works indoors and outdoors, contrary to active sensors such as structured-light or time of flight.

Depth Map #

Depth maps captured by the ZED store a distance value (Z) for each pixel (X, Y) in the image. The distance is expressed in metric units (meters for example) and calculated from the back of the left eye of the camera to the scene object.

Depth maps cannot be displayed directly as they are encoded on 32 bits. To display the depth map, a monochrome (grayscale) 8-bit representation is necessary with values between [0, 255], where 255 represents the closest possible depth value and 0 is the most distant possible depth value.

3D Point Cloud #

Another common way of representing depth information is by a 3D point cloud. A point cloud can be seen as a depth map in three dimensions. While a depth map only contains the distance or Z information for each pixel, a point cloud is a collection of 3D points (X,Y,Z) that represent the external surface of the scene and can contain color information.

For more information on Depth Sensing, see the Using the API page.