Existing techniques for monocular 3D detection have a serious restriction. They tend to perform well only on a limited set of benchmarks, faring well either on ego-centric car views or on traffic camera views, but rarely on both. To encourage progress, this work advocates for an extended evaluation of 3D detection frameworks across different camera perspectives.
We make two key contributions. First, we introduce the CARLA Drone dataset, CDrone. Simulating drone views, it substantially expands the diversity of camera perspectives in existing benchmarks. Despite its synthetic nature, CDrone represents a real-world challenge. To show this, we confirm that previous techniques struggle to perform well both on CDrone and a real-world in-house 3D drone dataset.
Second, we develop an effective data augmentation pipeline called GroundMix. Its distinguishing element is the use of the ground for creating 3D-consistent augmentation of a training image. GroundMix significantly boosts the detection accuracy of a lightweight one-stage detector. In our expanded evaluation, we achieve the average precision on par with or substantially higher than the previous state of the art across all datasets.
CDrone comprises 42 locations across 7 worlds within the Carla simulation environment, encompassing urban and rural landscapes. The dataset features a mixture of nighttime, daytime, dawn and rainy scenes and is divided into 24 training, 9 validation and 9 test locations. To foster future research, we also release track IDs, depth maps as well instance segmentation masks along with 2D and 3D bounding box labels. Annotations are provided in the OMNI3D format, and our evaluation toolkit enables 3D average precision evaluation with full support for SO(3) rotations.
cat part_* > cdrone_combined.zip; unzip cdrone_combined.zipThe CDrone dataset as well as the provided code are licensed under CC BY-NC-SA 4.0
To address the practical reality of dynamic UAV trajectories, we introduce the Moving CDrone dataset as an additional evaluation dataset. This dataset serves as a test set to assess tracking performance in scenarios where the drone is not stationary but moves through the scene at speeds of up to 5 m/s. Complementing the standard static CDrone, this subset provides 111 videos of 30 seconds each.
Download the data from here into a folder and execute:
cat part_* > moving_cdrone_combined.zip; unzip moving_cdrone_combined.zip.
The Moving CDrone dataset is also licensed under CC BY-NC-SA 4.0.
We also conduct experiments on our in-house real-world drone dataset, with similar results validating the reliability of our synthetic data and the effectiveness of the CDrone benchmark. Please find here exemplary results for Epic Munich, Unparalleled Frankfurt, and Versatile Kronach.
Qualitative comparison of object tracking on non-flat terrain using a conventional 2D motion model (Left) versus our proposed motion model (Right). Over the visible trajectory (approximately 278 m), the accumulated position error of the 2D model is more than 3.5× larger, and the orientation error exceeds 6× that of our method. Performance gaps increase with steeper slopes and stronger gradient changes.
@inproceedings{meier2024cdrone,
author = {Meier, Johannes and Scalerandi, Luca and Dhaouadi, Oussema and Kaiser, Jacques and Araslanov Nikita and Cremers, Daniel},
title = {{CARLA Drone:} Monocular 3D Object Detection from a Different Perspective},
journal = {The German Conference on Pattern Recognition},
year = {2024},
}