CARLA Drone: Monocular 3D Object Detection from a Different Perspective

1 DeepScenario     2 TUM     3 MCML
Views (Waymo, Rope3D, CDrone) in comparison.

We perform Monocular 3D Object Detection from the car view (Waymo), traffic view (Rope3D) and drone view (CDrone, Ours) perspective. We achieve on par or substantially higher performance than previous state of the art methods across all tested datasets.

Abstract

Existing techniques for monocular 3D detection have a serious restriction. They tend to perform well only on a limited set of benchmarks, faring well either on ego-centric car views or on traffic camera views, but rarely on both. To encourage progress, this work advocates for an extended evaluation of 3D detection frameworks across different camera perspectives.

We make two key contributions. First, we introduce the CARLA Drone dataset, CDrone. Simulating drone views, it substantially expands the diversity of camera perspectives in existing benchmarks. Despite its synthetic nature, CDrone represents a real-world challenge. To show this, we confirm that previous techniques struggle to perform well both on CDrone and a real-world in-house 3D drone dataset.

Second, we develop an effective data augmentation pipeline called GroundMix. Its distinguishing element is the use of the ground for creating 3D-consistent augmentation of a training image. GroundMix significantly boosts the detection accuracy of a lightweight one-stage detector. In our expanded evaluation, we achieve the average precision on par with or sub-stantially higher than the previous state of the art across all datasets.


Our novel CDrone dataset

Dataset Properties

CDrone comprises 42 locations across 7 worlds within the Carla simulation environment, encompassing urban and rural landscapes. Each recording is populated with 265 vehicles. With 900 images per location captured at a rate of 12.5 frames per second and a resolution of 1920x1080 pixel, the dataset features a mixture of nighttime, daytime, dawn and rainy scenes. In total, CDrone contains 174,958 car, 18,556 truck, 25,080 motorcycle, 17,476 bicycle, 2,674 bus and 23,918 pedestrian annotations. It is divided into 24 training (21,600 images), 9 validation (8,100 images) and 9 test locations (8,100 images). To foster future research, we also release track IDs, depth maps as well instance segmentation masks along with 2D and 3D bounding box labels. Annotations are provided in the OMNI3D format, and our evaluation toolkit enables 3D average precision evaluation with full support for SO(3) rotations.

2D & 3D Bounding Boxes

CDrone sample image

Depth Maps

CDrone sample image

Inst. Segmentation Masks

CDrone sample image
CDrone sample image
CDrone sample image
CDrone sample image
CDrone sample image

Usage of the dataset

  • Download the dataset from here into a separate folder and execute
    cat part_* > cdrone_combined.zip; unzip cdrone_combined.zip
  • Objects with a visibility below 0.2 are neither used for training, nor for evaluation

The CDrone dataset as well as the provided code are licensed under CC BY-NC-SA 4.0


Experiments

CDrone

CDrone sample image

Rope3D

CDrone sample image

Waymo

CDrone sample image

We also conduct experiments on our in-house real-world drone dataset, with similar results validating the reliability of our synthetic data and the effectiveness of the CDrone benchmark. Please find here exemplary results for Epic Munich, Unparalleled Frankfurt, and Versatile Kronach.

Real-world drone data.
Real-world drone data.
Real-world drone data.
Real-world drone data.
For more real-world examples, visit  DeepScenario's web app.

BibTeX

@inproceedings{meier2024cdrone,
  author    = {Meier, Johannes and Scalerandi, Luca and Dhaouadi, Oussema and Kaiser, Jacques and Araslanov Nikita and Cremers, Daniel},
  title     = {{CARLA Drone:} Monocular 3D Object Detection from a Different Perspective},
  journal   = {GCPR},
  year      = {2024},
}