How to Use PyTorch with ZED

Introduction #

The ZED SDK can be interfaced with a PyTorch project to add 3D localization of objects detected with a custom neural network. In this tutorial, we will combine Mask R-CNN with the ZED SDK to detect, segment, classify and locate objects in 3D using a ZED stereo camera and PyTorch.

Installation #

The Mask R-CNN 3D project depends on the following libraries:

  • ZED SDK and Python API
  • Pytorch (with cuDNN)
  • OpenCV
  • CUDA
  • Python 3
  • Apex


Install the ZED SDK and Python API.

PyTorch Installation #

A dedicated environment can be created to set up PyTorch. Keep your environment activated while installing the following packages.

$ conda create --name pytorch1 -y
$ conda activate pytorch1

When installing PyTorch, the selected CUDA version must match the one used by the ZED SDK. Here, we use CUDA version 10.0

$ conda install pytorch torchvision cudatoolkit=10.0 -c pytorch
$ conda install -c conda-forge --yes --file requirements.txt

Note: Do not forget to install Python API inside your current environment.

Using Pip #

$ pip3 install torch torchvision
$ pip3 install -r requirements.txt

For more information, please refer to the PyTorch setup page.

Apex Installation #

We make use of NVIDIA’s Apex API. To install it, run the following:

$ git clone
$ cd apex
$ python3 install

Mask R-CNN Installation #

Setup Mask R-CNN. If you’re using a conda environment, make sure it is still active before running the following commands.

$ git clone
$ cd maskrcnn-benchmark
$ python3 install

Running Mask R-CNN 3D #

Download the sample project code from GitHub. The next commands are launched from the sample directory.

Run the code with python3. You should be detecting objects captured by your ZED camera using the Mask R-CNN ResNet 50 model and localizing them in 3D.

$ python --config-file configs/caffe2/e2e_mask_rcnn_R_50_C4_1x_caffe2.yaml --min-image-size 256

Testing Other Models #

Pre-trained models can be found in Selected models are downloaded automatically. Here we test Mask R-CNN with ResNet 101.

$ python --config-file configs/caffe2/e2e_mask_rcnn_R_101_FPN_1x_caffe2.yaml --min-image-size 300

Now let’s test 3D key points extraction:

$ python --config-file configs/caffe2/e2e_keypoint_rcnn_R_50_FPN_1x_caffe2.yaml --min-image-size 300

Other Options #

You can launch object segmentation on recorded videos in SVO format using the following command:

$ python --svo-filename path/to/svo_file.svo

Best accuracy can be obtained using min-image-size 800 (with reduced FPS).

$ python --min-image-size 800

To display heatmaps, use --show-mask-heatmaps.

$ python --min-image-size 300 --show-mask-heatmaps

Finally to run the model on the CPU, use MODEL.DEVICE cpu.

$ python --min-image-size 300 MODEL.DEVICE cpu