Towards Generalization Across Depth for Monocular 3D Object Detection
Andrea Simonelli, Samuel Rota Bul\`o, Lorenzo Porzi, Elisa Ricci,, Peter Kontschieder

TL;DR
This paper introduces MoVi-3D, a novel monocular 3D object detection architecture that uses virtual views to normalize object appearance across distances, achieving state-of-the-art results with a lightweight model.
Contribution
The work presents a new single-stage deep architecture that leverages geometrical virtual views to improve monocular 3D detection and reduce model complexity.
Findings
Achieves state-of-the-art results on KITTI3D benchmark.
Uses virtual view generation to normalize object appearance across distances.
Enables a lightweight model to perform effectively in 3D detection.
Abstract
While expensive LiDAR and stereo camera rigs have enabled the development of successful 3D object detection methods, monocular RGB-only approaches lag much behind. This work advances the state of the art by introducing MoVi-3D, a novel, single-stage deep architecture for monocular 3D object detection. MoVi-3D builds upon a novel approach which leverages geometrical information to generate, both at training and test time, virtual views where the object appearance is normalized with respect to distance. These virtually generated views facilitate the detection task as they significantly reduce the visual appearance variability associated to objects placed at different distances from the camera. As a consequence, the deep model is relieved from learning depth-specific representations and its complexity can be significantly reduced. In particular, in this work we show that, thanks to our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsTest
