YOLOStereo3D: A Step Back to 2D for Efficient Stereo 3D Detection
Yuxuan Liu, Lujia Wang, Ming Liu

TL;DR
YOLOStereo3D leverages 2D detection insights and a lightweight stereo matching module to achieve real-time stereo 3D detection on low-cost robots, matching state-of-the-art performance without LiDAR.
Contribution
It introduces a novel framework that simplifies stereo 3D detection by building on 2D detection methods and adding a lightweight stereo matching component.
Findings
Runs at over 10 fps on a single GPU
Achieves performance comparable to state-of-the-art methods
Does not require LiDAR data for 3D detection
Abstract
Object detection in 3D with stereo cameras is an important problem in computer vision, and is particularly crucial in low-cost autonomous mobile robots without LiDARs. Nowadays, most of the best-performing frameworks for stereo 3D object detection are based on dense depth reconstruction from disparity estimation, making them extremely computationally expensive. To enable real-world deployments of vision detection with binocular images, we take a step back to gain insights from 2D image-based detection frameworks and enhance them with stereo features. We incorporate knowledge and the inference structure from real-time one-stage 2D/3D object detector and introduce a light-weight stereo matching module. Our proposed framework, YOLOStereo3D, is trained on one single GPU and runs at more than ten fps. It demonstrates performance comparable to state-of-the-art stereo 3D detection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Vision and Imaging · Robotics and Sensor-Based Localization
