MonoDistill: Learning Spatial Features for Monocular 3D Object Detection
Zhiyu Chong, Xinzhu Ma, Hong Zhang, Yuxin Yue, Haojie Li, Zhihui Wang,, Wanli Ouyang

TL;DR
MonoDistill introduces a novel knowledge transfer approach from LiDAR signals to monocular 3D object detectors, significantly improving their accuracy without extra inference costs, and achieves top performance on the KITTI benchmark.
Contribution
The paper presents a simple scheme to incorporate LiDAR-derived spatial information into monocular detectors via knowledge distillation, enhancing 3D detection accuracy.
Findings
Significant performance boost on KITTI benchmark.
Effective knowledge transfer from LiDAR signals to monocular models.
Validated through extensive ablation studies.
Abstract
3D object detection is a fundamental and challenging task for 3D scene understanding, and the monocular-based methods can serve as an economical alternative to the stereo-based or LiDAR-based methods. However, accurately detecting objects in the 3D space from a single image is extremely difficult due to the lack of spatial cues. To mitigate this issue, we propose a simple and effective scheme to introduce the spatial information from LiDAR signals to the monocular 3D detectors, without introducing any extra cost in the inference phase. In particular, we first project the LiDAR signals into the image plane and align them with the RGB images. After that, we use the resulting data to train a 3D detector (LiDAR Net) with the same architecture as the baseline model. Finally, this LiDAR Net can serve as the teacher to transfer the learned knowledge to the baseline model. Experimental results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Video Surveillance and Tracking Methods · Industrial Vision Systems and Defect Detection
