CARLA Drone: Monocular 3D Object Detection from a Different Perspective

1 DeepScenario     2 TUM     3 MCML
Views (Waymo, Rope3D, CDrone) in comparison.

We perform Monocular 3D Object Detection from the car view (Waymo), traffic view (Rope3D) and drone view (CDrone, Ours) perspective. We achieve on par or substantially higher performance than previous state of the art methods across all tested datasets.

Abstract

Existing techniques for monocular 3D detection have a serious restriction. They tend to perform well only on a limited set of benchmarks, faring well either on ego-centric car views or on traffic camera views, but rarely on both. To encourage progress, this work advocates for an extended evaluation of 3D detection frameworks across different camera perspectives.

We make two key contributions. First, we introduce the CARLA Drone dataset, CDrone. Simulating drone views, it substantially expands the diversity of camera perspectives in existing benchmarks. Despite its synthetic nature, CDrone represents a real-world challenge. To show this, we confirm that previous techniques struggle to perform well both on CDrone and a real-world in-house 3D drone dataset.

Second, we develop an effective data augmentation pipeline called GroundMix. Its distinguishing element is the use of the ground for creating 3D-consistent augmentation of a training image. GroundMix significantly boosts the detection accuracy of a lightweight one-stage detector. In our expanded evaluation, we achieve the average precision on par with or substantially higher than the previous state of the art across all datasets.

CDrone

CDrone comprises 42 locations across 7 worlds within the Carla simulation environment, encompassing urban and rural landscapes. The dataset features a mixture of nighttime, daytime, dawn and rainy scenes and is divided into 24 training, 9 validation and 9 test locations. To foster future research, we also release track IDs, depth maps as well instance segmentation masks along with 2D and 3D bounding box labels. Annotations are provided in the OMNI3D format, and our evaluation toolkit enables 3D average precision evaluation with full support for SO(3) rotations.

42 locations 7 worlds 37,800 images 1920×1080 174,958 18,556 25,080 17,476 2,674 23,918

2D & 3D Bounding Boxes

CDrone sample image

Depth Maps

CDrone sample image

Inst. Segmentation Masks

CDrone sample image
Depth distribution
Angle distribution
Bounding box width distribution
Bounding box height distribution

Usage of the dataset

  • Download all the files from here into a separate folder and execute:
    cat part_* > cdrone_combined.zip; unzip cdrone_combined.zip
  • Objects with a visibility below 0.35 are neither used for training, nor for evaluation
  • Semantic segmentation classes by color: (0) None (1) Buildings (2) Cyclist & Motorcyclist (3) Pedestrian (4) Poles & Traffic signs & Traffic lights (5) Roads & Sidewalks (6) Vegetation (7) Vehicles (8) Walls & Fences

The CDrone dataset as well as the provided code are licensed under CC BY-NC-SA 4.0

Moving CDrone

To address the practical reality of dynamic UAV trajectories, we introduce the Moving CDrone dataset as an additional evaluation dataset. This dataset serves as a test set to assess tracking performance in scenarios where the drone is not stationary but moves through the scene at speeds of up to 5 m/s. Complementing the standard static CDrone, this subset provides 111 videos of 30 seconds each.

111 videos 30 s each 33,300 images 188,468 annotations 87,126 45,338 29,089 13,195 10,093 3,627

Download

Download the data from here into a folder and execute:
cat part_* > moving_cdrone_combined.zip; unzip moving_cdrone_combined.zip.

The Moving CDrone dataset is also licensed under CC BY-NC-SA 4.0.


Experiments

CDrone

CDrone results

Rope3D

Rope3D results

Waymo

Waymo results

We also conduct experiments on our in-house real-world drone dataset, with similar results validating the reliability of our synthetic data and the effectiveness of the CDrone benchmark. Please find here exemplary results for Epic Munich, Unparalleled Frankfurt, and Versatile Kronach.

Real-world drone data.
Real-world drone data.
Real-world drone data.
Real-world drone data.
For more real-world examples, visit  DeepScenario's web app.


Tracking visualization

Qualitative comparison of object tracking on non-flat terrain using a conventional 2D motion model (Left) versus our proposed motion model (Right). Over the visible trajectory (approximately 278 m), the accumulated position error of the 2D model is more than 3.5× larger, and the orientation error exceeds 6× that of our method. Performance gaps increase with steeper slopes and stronger gradient changes.

BibTeX

@inproceedings{meier2024cdrone,
  author    = {Meier, Johannes and Scalerandi, Luca and Dhaouadi, Oussema and Kaiser, Jacques and Araslanov Nikita and Cremers, Daniel},
  title     = {{CARLA Drone:} Monocular 3D Object Detection from a Different Perspective},
  journal   = {The German Conference on Pattern Recognition},
  year      = {2024},
}