Multi-View Attentive Contextualization for Multi-View 3D Object   Detection

Xianpeng Liu; Ce Zheng; Ming Qian; Nan Xue; Chen Chen; Zhebin Zhang,; Chen Li; Tianfu Wu

arXiv:2405.12200·cs.CV·May 21, 2024

Multi-View Attentive Contextualization for Multi-View 3D Object Detection

Xianpeng Liu, Ce Zheng, Ming Qian, Nan Xue, Chen Chen, Zhebin Zhang,, Chen Li, Tianfu Wu

PDF

Open Access

TL;DR

This paper introduces MvACon, a novel attentive contextualization method that enhances multi-view 3D object detection by effectively utilizing dense scene-level features, leading to improved accuracy across multiple benchmarks.

Contribution

MvACon provides a dense yet computationally efficient attentive contextualization scheme that improves 2D-to-3D feature lifting in multi-view 3D detection, agnostic to specific lifting approaches.

Findings

01

Consistent performance improvements on nuScenes benchmark.

02

Enhanced detection accuracy in location, orientation, and velocity.

03

Effective encoding of dense scene-level contexts.

Abstract

We present Multi-View Attentive Contextualization (MvACon), a simple yet effective method for improving 2D-to-3D feature lifting in query-based multi-view 3D (MV3D) object detection. Despite remarkable progress witnessed in the field of query-based MV3D object detection, prior art often suffers from either the lack of exploiting high-resolution 2D features in dense attention-based lifting, due to high computational costs, or from insufficiently dense grounding of 3D queries to multi-scale 2D features in sparse attention-based lifting. Our proposed MvACon hits the two birds with one stone using a representationally dense yet computationally sparse attentive feature contextualization scheme that is agnostic to specific 2D-to-3D feature lifting approaches. In experiments, the proposed MvACon is thoroughly tested on the nuScenes benchmark, using both the BEVFormer and its recent 3D…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Video Surveillance and Tracking Methods