CrossDTR: Cross-view and Depth-guided Transformers for 3D Object Detection
Ching-Yu Tseng, Yi-Rong Chen, Hsin-Ying Lee, Tsung-Han Wu, Wen-Chin, Chen, Winston H. Hsu

TL;DR
CrossDTR introduces a lightweight depth predictor and a cross-view transformer to improve 3D object detection accuracy and speed in autonomous driving, especially for small objects like pedestrians.
Contribution
The paper proposes a novel depth-guided transformer framework with a lightweight depth predictor that enhances multi-camera 3D detection performance and efficiency.
Findings
Achieved 10% improvement in pedestrian detection accuracy.
Surpassed existing methods by 3% in overall mAP and NDS metrics.
Operates 5 times faster than prior approaches.
Abstract
To achieve accurate 3D object detection at a low cost for autonomous driving, many multi-camera methods have been proposed and solved the occlusion problem of monocular approaches. However, due to the lack of accurate estimated depth, existing multi-camera methods often generate multiple bounding boxes along a ray of depth direction for difficult small objects such as pedestrians, resulting in an extremely low recall. Furthermore, directly applying depth prediction modules to existing multi-camera methods, generally composed of large network architectures, cannot meet the real-time requirements of self-driving applications. To address these issues, we propose Cross-view and Depth-guided Transformers for 3D Object Detection, CrossDTR. First, our lightweight depth predictor is designed to produce precise object-wise sparse depth maps and low-dimensional depth embeddings without extra…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Video Surveillance and Tracking Methods · Autonomous Vehicle Technology and Safety
