CrossDTR: Cross-view and Depth-guided Transformers for 3D Object   Detection

Ching-Yu Tseng; Yi-Rong Chen; Hsin-Ying Lee; Tsung-Han Wu; Wen-Chin; Chen; Winston H. Hsu

arXiv:2209.13507·cs.CV·February 6, 2023

CrossDTR: Cross-view and Depth-guided Transformers for 3D Object Detection

Ching-Yu Tseng, Yi-Rong Chen, Hsin-Ying Lee, Tsung-Han Wu, Wen-Chin, Chen, Winston H. Hsu

PDF

Open Access 1 Repo

TL;DR

CrossDTR introduces a lightweight depth predictor and a cross-view transformer to improve 3D object detection accuracy and speed in autonomous driving, especially for small objects like pedestrians.

Contribution

The paper proposes a novel depth-guided transformer framework with a lightweight depth predictor that enhances multi-camera 3D detection performance and efficiency.

Findings

01

Achieved 10% improvement in pedestrian detection accuracy.

02

Surpassed existing methods by 3% in overall mAP and NDS metrics.

03

Operates 5 times faster than prior approaches.

Abstract

To achieve accurate 3D object detection at a low cost for autonomous driving, many multi-camera methods have been proposed and solved the occlusion problem of monocular approaches. However, due to the lack of accurate estimated depth, existing multi-camera methods often generate multiple bounding boxes along a ray of depth direction for difficult small objects such as pedestrians, resulting in an extremely low recall. Furthermore, directly applying depth prediction modules to existing multi-camera methods, generally composed of large network architectures, cannot meet the real-time requirements of self-driving applications. To address these issues, we propose Cross-view and Depth-guided Transformers for 3D Object Detection, CrossDTR. First, our lightweight depth predictor is designed to produce precise object-wise sparse depth maps and low-dimensional depth embeddings without extra…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sty61010/crossdtr
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Video Surveillance and Tracking Methods · Autonomous Vehicle Technology and Safety