BEVDepth: Acquisition of Reliable Depth for Multi-view 3D Object Detection
Yinhao Li, Zheng Ge, Guanyi Yu, Jinrong Yang, Zengran Wang, Yukang, Shi, Jianjian Sun, Zeming Li

TL;DR
BEVDepth introduces a new camera-based 3D object detection method that significantly improves depth estimation accuracy through explicit supervision and novel modules, achieving state-of-the-art results on nuScenes.
Contribution
The paper presents BEVDepth, a novel 3D detector with explicit depth supervision, a camera-aware depth module, and a depth refinement module, advancing camera-based 3D detection accuracy.
Findings
Achieves 60.9% NDS on nuScenes test set
Introduces explicit depth supervision for better depth estimation
Sets new state-of-the-art performance in camera-based 3D detection
Abstract
In this research, we propose a new 3D object detector with a trustworthy depth estimation, dubbed BEVDepth, for camera-based Bird's-Eye-View (BEV) 3D object detection. Our work is based on a key observation -- depth estimation in recent approaches is surprisingly inadequate given the fact that depth is essential to camera 3D detection. Our BEVDepth resolves this by leveraging explicit depth supervision. A camera-awareness depth estimation module is also introduced to facilitate the depth predicting capability. Besides, we design a novel Depth Refinement Module to counter the side effects carried by imprecise feature unprojection. Aided by customized Efficient Voxel Pooling and multi-frame mechanism, BEVDepth achieves the new state-of-the-art 60.9% NDS on the challenging nuScenes test set while maintaining high efficiency. For the first time, the NDS score of a camera model reaches 60%.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Video Surveillance and Tracking Methods
MethodsTest · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
