DenseBEV: Transforming BEV Grid Cells into 3D Objects

Marius D\"ahling; Sebastian Krebs; J. Marius Z\"ollner

arXiv:2512.16818·cs.CV·December 19, 2025

DenseBEV: Transforming BEV Grid Cells into 3D Objects

Marius D\"ahling, Sebastian Krebs, J. Marius Z\"ollner

PDF

Open Access

TL;DR

DenseBEV introduces a novel end-to-end approach for 3D object detection using BEV feature cells as anchors, improving detection accuracy especially for small objects and achieving state-of-the-art results on major datasets.

Contribution

The paper proposes using BEV feature cells directly as anchors and incorporates a hybrid temporal modeling approach, enhancing efficiency and detection performance in multi-camera 3D object detection.

Findings

01

Significant improvements in NDS and mAP on nuScenes.

02

Enhanced pedestrian detection with 3.8% mAP increase.

03

State-of-the-art performance on Waymo dataset with 60.7% LET-mAP.

Abstract

In current research, Bird's-Eye-View (BEV)-based transformers are increasingly utilized for multi-camera 3D object detection. Traditional models often employ random queries as anchors, optimizing them successively. Recent advancements complement or replace these random queries with detections from auxiliary networks. We propose a more intuitive and efficient approach by using BEV feature cells directly as anchors. This end-to-end approach leverages the dense grid of BEV queries, considering each cell as a potential object for the final detection task. As a result, we introduce a novel two-stage anchor generation method specifically designed for multi-camera 3D object detection. To address the scaling issues of attention with a large number of queries, we apply BEV-based Non-Maximum Suppression, allowing gradients to flow only through non-suppressed objects. This ensures efficient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques