BEVNeXt: Reviving Dense BEV Frameworks for 3D Object Detection

Zhenxin Li; Shiyi Lan; Jose M. Alvarez; Zuxuan Wu

arXiv:2312.01696·cs.CV·March 26, 2024·1 cites

BEVNeXt: Reviving Dense BEV Frameworks for 3D Object Detection

Zhenxin Li, Shiyi Lan, Jose M. Alvarez, Zuxuan Wu

PDF

Open Access 1 Repo

TL;DR

BEVNeXt revitalizes dense BEV frameworks for 3D object detection by integrating CRF-based depth estimation, temporal aggregation, and a two-stage decoder, achieving state-of-the-art results on nuScenes.

Contribution

The paper introduces BEVNeXt, a modernized dense BEV framework with novel components that enhance depth estimation and object localization, surpassing existing methods.

Findings

01

Achieves 64.2 NDS on nuScenes test set.

02

Outperforms both dense BEV and query-based frameworks.

03

Demonstrates superior depth estimation and localization accuracy.

Abstract

Recently, the rise of query-based Transformer decoders is reshaping camera-based 3D object detection. These query-based decoders are surpassing the traditional dense BEV (Bird's Eye View)-based methods. However, we argue that dense BEV frameworks remain important due to their outstanding abilities in depth estimation and object localization, depicting 3D scenes accurately and comprehensively. This paper aims to address the drawbacks of the existing dense BEV-based 3D object detectors by introducing our proposed enhanced components, including a CRF-modulated depth estimation module enforcing object-level consistencies, a long-term temporal aggregation module with extended receptive fields, and a two-stage object decoder combining perspective techniques with CRF-modulated depth embedding. These enhancements lead to a "modernized" dense BEV framework dubbed BEVNeXt. On the nuScenes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

woxihuanjiangguo/bevnext
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Image Processing and 3D Reconstruction · Advanced Image and Video Retrieval Techniques

MethodsMulti-Head Attention · Linear Layer · Attention Is All You Need · Absolute Position Encodings · Dropout · Dense Connections · Byte Pair Encoding · Softmax · Layer Normalization · Position-Wise Feed-Forward Layer