Boosting Monocular 3D Object Detection with Object-Centric Auxiliary Depth Supervision
Youngseok Kim, Sanmin Kim, Sangmin Sim, Jun Won Choi, Dongsuk Kum

TL;DR
This paper introduces an object-centric depth supervision method that enhances monocular 3D object detection by jointly training detection and depth prediction using raw LiDAR data, improving accuracy and efficiency.
Contribution
The paper proposes a novel object-centric depth loss and an end-to-end training framework that leverages raw LiDAR points without extra annotation, boosting monocular 3D detection performance.
Findings
Outperforms depth map approaches on KITTI and nuScenes
Maintains real-time inference speed
Significantly improves 3D detection accuracy
Abstract
Recent advances in monocular 3D detection leverage a depth estimation network explicitly as an intermediate stage of the 3D detection network. Depth map approaches yield more accurate depth to objects than other methods thanks to the depth estimation network trained on a large-scale dataset. However, depth map approaches can be limited by the accuracy of the depth map, and sequentially using two separated networks for depth estimation and 3D detection significantly increases computation cost and inference time. In this work, we propose a method to boost the RGB image-based 3D detector by jointly training the detection network with a depth prediction loss analogous to the depth estimation task. In this way, our 3D detection network can be supervised by more depth supervision from raw LiDAR points, which does not require any human annotation cost, to estimate accurate depth without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Industrial Vision Systems and Defect Detection · Advanced Vision and Imaging
