DVPE: Divided View Position Embedding for Multi-View 3D Object Detection

Jiasen Wang; Zhenglin Li; Ke Sun; Xianyuan Liu; Yang Zhou

arXiv:2407.16955·cs.CV·July 25, 2024

DVPE: Divided View Position Embedding for Multi-View 3D Object Detection

Jiasen Wang, Zhenglin Li, Ke Sun, Xianyuan Liu, Yang Zhou

PDF

1 Repo

TL;DR

DVPE introduces a divided view position embedding method for multi-view 3D object detection, effectively balancing receptive field expansion and interference reduction, while incorporating temporal information for state-of-the-art results.

Contribution

The paper proposes a novel divided view position embedding approach that decouples position encoding from camera poses and integrates temporal features, improving multi-view 3D detection performance.

Findings

01

Achieves 57.2% mAP and 64.5% NDS on nuScenes

02

Reduces interference in multi-view feature aggregation

03

Enhances training stability with a one-to-many assignment strategy

Abstract

Sparse query-based paradigms have achieved significant success in multi-view 3D detection for autonomous vehicles. Current research faces challenges in balancing between enlarging receptive fields and reducing interference when aggregating multi-view features. Moreover, different poses of cameras present challenges in training global attention models. To address these problems, this paper proposes a divided view method, in which features are modeled globally via the visibility crossattention mechanism, but interact only with partial features in a divided local virtual space. This effectively reduces interference from other irrelevant features and alleviates the training difficulties of the transformer by decoupling the position embedding from camera poses. Additionally, 2D historical RoI features are incorporated into the object-centric temporal modeling to utilize highlevel visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dop0/dvpe
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Attention Is All You Need