TL;DR
MonoSAOD introduces a novel framework for monocular 3D object detection in sparsely annotated datasets, utilizing patch augmentation and prototype-guided pseudo-labeling to improve detection accuracy.
Contribution
It proposes two innovative modules, RAPA and PBF, to effectively leverage sparse annotations for monocular 3D detection, addressing annotation cost issues.
Findings
Significant performance improvement on sparse datasets
Effective pseudo-label filtering with prototype similarity
Robust detection with geometry-preserving augmentation
Abstract
Monocular 3D object detection has achieved impressive performance on densely annotated datasets. However, it struggles when only a fraction of objects are labeled due to the high cost of 3D annotation. This sparsely annotated setting is common in real-world scenarios where annotating every object is impractical. To address this, we propose a novel framework for sparsely annotated monocular 3D object detection with two key modules. First, we propose Road-Aware Patch Augmentation (RAPA), which leverages sparse annotations by augmenting segmented object patches onto road regions while preserving 3D geometric consistency. Second, we propose Prototype-Based Filtering (PBF), which generates high-quality pseudo-labels by filtering predictions through prototype similarity and depth uncertainty. It maintains global 2D RoI feature prototypes and selects pseudo-labels that are both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
