Towards Safe, Real-Time Systems: Stereo vs Images and LiDAR for 3D Object Detection
Matthew Levine

TL;DR
This paper evaluates stereo vision as a cost-effective and safe alternative to LiDAR and monocular methods for 3D object detection, demonstrating comparable localization accuracy and calibration benefits.
Contribution
It introduces a multimodal learning approach combining traditional disparity algorithms with image-based detectors, improving performance without increasing model complexity.
Findings
Stereo can match LiDAR in 3D localization in certain contexts
Multimodal learning with disparity algorithms enhances image-based detection
Corrected metric computation methods for KITTI dataset
Abstract
As object detectors rapidly improve, attention has expanded past image-only networks to include a range of 3D and multimodal frameworks, especially ones that incorporate LiDAR. However, due to cost, logistics, and even some safety considerations, stereo can be an appealing alternative. Towards understanding the efficacy of stereo as a replacement for monocular input or LiDAR in object detectors, we show that multimodal learning with traditional disparity algorithms can improve image-based results without increasing the number of parameters, and that learning over stereo error can impart similar 3D localization power to LiDAR in certain contexts. Furthermore, doing so also has calibration benefits with respect to image-only methods. We benchmark on the public dataset KITTI, and in doing so, reveal a few small but common algorithmic mistakes currently used in computing metrics on that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
