StereoDETR: Stereo-based Transformer for 3D Object Detection
Shiyi Mu, Zichong Gu, Zhiqi Ai, Anqi Liu, Yilin Gao, Shugong Xu

TL;DR
StereoDETR is an efficient stereo-based 3D object detection framework that combines a monocular DETR branch and a stereo branch, achieving real-time inference and state-of-the-art accuracy on the KITTI benchmark.
Contribution
It introduces a novel dual-branch architecture with a differentiable depth sampling strategy for improved speed and accuracy in stereo 3D detection.
Findings
Achieves real-time inference speed surpassing monocular methods.
Sets new state-of-the-art results on KITTI pedestrian and cyclist detection.
Maintains competitive accuracy with reduced computational overhead.
Abstract
Compared to monocular 3D object detection, stereo-based 3D methods offer significantly higher accuracy but still suffer from high computational overhead and latency. The state-of-the-art stereo 3D detection method achieves twice the accuracy of monocular approaches, yet its inference speed is only half as fast. In this paper, we propose StereoDETR, an efficient stereo 3D object detection framework based on DETR. StereoDETR consists of two branches: a monocular DETR branch and a stereo branch. The DETR branch is built upon 2D DETR with additional channels for predicting object scale, orientation, and sampling points. The stereo branch leverages low-cost multi-scale disparity features to predict object-level depth maps. These two branches are coupled solely through a differentiable depth sampling strategy. To handle occlusion, we introduce a constrained supervision strategy for sampling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Advanced Vision and Imaging
