IDEAL-M3D: Instance Diversity-Enriched Active Learning for Monocular 3D Detection
Johannes Meier, Florian G\"unther, Riccardo Marin, Oussema Dhaouadi, Jacques Kaiser, Daniel Cremers

TL;DR
IDEAL-M3D introduces an instance-level active learning approach for monocular 3D detection, leveraging diversity to efficiently select informative samples and reduce annotation costs while maintaining high detection performance.
Contribution
It is the first instance-level active learning pipeline for monocular 3D detection that uses a diverse ensemble to improve sample selection efficiency.
Findings
Achieves comparable or better AP3D with only 60% annotations.
Demonstrates the effectiveness of diversity-driven sample selection.
Reduces annotation costs significantly while maintaining detection accuracy.
Abstract
Monocular 3D detection relies on just a single camera and is therefore easy to deploy. Yet, achieving reliable 3D understanding from monocular images requires substantial annotation, and 3D labels are especially costly. To maximize performance under constrained labeling budgets, it is essential to prioritize annotating samples expected to deliver the largest performance gains. This prioritization is the focus of active learning. Curiously, we observed two significant limitations in active learning algorithms for 3D monocular object detection. First, previous approaches select entire images, which is inefficient, as non-informative instances contained in the same image also need to be labeled. Secondly, existing methods rely on uncertainty-based selection, which in monocular 3D object detection creates a bias toward depth ambiguity. Consequently, distant objects are selected, while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
