GeoBEV: Learning Geometric BEV Representation for Multi-view 3D Object Detection
Jinqing Zhang, Yanan Zhang, Yunlong Qi, Zehua Fu, Qingjie Liu, Yunhong, Wang

TL;DR
GeoBEV introduces a high-resolution BEV representation with novel sampling and labeling techniques to enhance multi-view 3D object detection accuracy, achieving state-of-the-art results on nuScenes.
Contribution
The paper proposes Radial-Cartesian BEV Sampling and In-Box Labeling to improve geometric fidelity in BEV representations for 3D detection.
Findings
Outperforms existing methods in BEV resolution and geometric accuracy
Achieves 66.2% NDS on nuScenes test set
Introduces novel loss and sampling strategies for better 3D scene understanding
Abstract
Bird's-Eye-View (BEV) representation has emerged as a mainstream paradigm for multi-view 3D object detection, demonstrating impressive perceptual capabilities. However, existing methods overlook the geometric quality of BEV representation, leaving it in a low-resolution state and failing to restore the authentic geometric information of the scene. In this paper, we identify the drawbacks of previous approaches that limit the geometric quality of BEV representation and propose Radial-Cartesian BEV Sampling (RC-Sampling), which outperforms other feature transformation methods in efficiently generating high-resolution dense BEV representation to restore fine-grained geometric information. Additionally, we design a novel In-Box Label to substitute the traditional depth label generated from the LiDAR points. This label reflects the actual geometric structure of objects rather than just their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization
