How to Use PyTorch with ZED


The ZED SDK can be interfaced with a PyTorch project to add 3D localization of objects detected with a custom neural network. In this tutorial, we will combine Mask R-CNN with the ZED SDK to detect, segment, classify and locate objects in 3D using a ZED stereo camera and PyTorch.


The Mask R-CNN 3D project depends on the following libraries:

  • ZED SDK and Python API
  • Pytorch (with cuDNN)
  • OpenCV
  • CUDA
  • Python 3


  • Install the ZED SDK and Python API.

  • Download the sample project code from GitHub. The next commands will need to be launched within the sample directory.

PyTorch Installation

A dedicated environment can be created to setup PyTorch.

$ conda create --name pytorch1 -y
$ conda activate pytorch1

When installing PyTorch, make sure the selected CUDA version match the one used by the ZED SDK.

$ conda install pytorch torchvision cudatoolkit=10.0 -c pytorch
$ conda install --yes --file requirements.txt

Using Pip

$ pip3 install torch torchvision
$ pip3 install -r requirements.txt

For more information, please refer to the PyTorch setup page.

Mask R-CNN Installation

Make sure the Conda environment is active.

$ conda activate pytorch1

Then setup Mask R-CNN:

$ python3 install

Running Mask R-CNN 3D

Run the code with python3. You should be detecting objects captured by your ZED camera using the Mask R-CNN ResNet 50 model and localizing them in 3D.

$ python --config-file ../configs/caffe2/e2e_mask_rcnn_R_50_C4_1x_caffe2.yaml --min-image-size 256

Testing Other Models

Pre-trained models can be found in Selected models are downloaded automatically. Here we test Mask R-CNN with ResNet 101.

$ python --config-file ../configs/caffe2/e2e_mask_rcnn_R_101_FPN_1x_caffe2.yaml --min-image-size 300

Now let’s test 3D keypoints extraction:

$ python --config-file ../configs/caffe2/e2e_keypoint_rcnn_R_50_FPN_1x_caffe2.yaml --min-image-size 300

Other Options

You can launch object segmentation on recorded videos in SVO format using the following command:

$ python --svo-filename path/to/svo_file.svo

Best accuracy can be obtained using min-image-size 800 (with reduced FPS).

$ python --min-image-size 800

To display heatmaps, use --show-mask-heatmaps.

$ python --min-image-size 300 --show-mask-heatmaps

Finally to run the model on CPU, use MODEL.DEVICE cpu.

$ python --min-image-size 300 MODEL.DEVICE cpu