Transfer Learning from Simulated to Real Scenes for Monocular 3D Object Detection
Sondos Mohamed, Walter Zimmer, Ross Greer, Ahmed Alaaeldin Ghita,, Modesto Castrill\'on-Santana, Mohan Trivedi, Alois Knoll, Salvatore Mario, Carta, Mirko Marras

TL;DR
This paper presents a two-stage transfer learning approach that improves monocular 3D object detection in roadside scenes by training on synthetic data and fine-tuning on real datasets, significantly boosting accuracy.
Contribution
It introduces a novel transfer learning strategy from synthetic to real data for monocular 3D detection, enhancing performance on challenging benchmarks.
Findings
Mean average precision improved from 0.26 to 12.76 on TUM Traffic A9.
Detection performance increased from 2.09 to 6.60 on DAIR-V2X-I.
Two-stage training effectively bridges the gap between synthetic and real data.
Abstract
Accurately detecting 3D objects from monocular images in dynamic roadside scenarios remains a challenging problem due to varying camera perspectives and unpredictable scene conditions. This paper introduces a two-stage training strategy to address these challenges. Our approach initially trains a model on the large-scale synthetic dataset, RoadSense3D, which offers a diverse range of scenarios for robust feature learning. Subsequently, we fine-tune the model on a combination of real-world datasets to enhance its adaptability to practical conditions. Experimental results of the Cube R-CNN model on challenging public benchmarks show a remarkable improvement in detection performance, with a mean average precision rising from 0.26 to 12.76 on the TUM Traffic A9 Highway dataset and from 2.09 to 6.60 on the DAIR-V2X-I dataset when performing transfer learning. Code, data, and qualitative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Robotics and Sensor-Based Localization
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · 1x1 Convolution · Convolution · Thinned U-shape Module
