Learning High-resolution Vector Representation from Multi-Camera Images   for 3D Object Detection

Zhili Chen; Shuangjie Xu; Maosheng Ye; Zian Qian; Xiaoyi Zou; Dit-Yan; Yeung; Qifeng Chen

arXiv:2407.15354·cs.CV·July 23, 2024

Learning High-resolution Vector Representation from Multi-Camera Images for 3D Object Detection

Zhili Chen, Shuangjie Xu, Maosheng Ye, Zian Qian, Xiaoyi Zou, Dit-Yan, Yeung, Qifeng Chen

PDF

Open Access 1 Repo

TL;DR

This paper introduces VectorFormer, a high-resolution vector representation method for 3D object detection from multi-camera images, combining it with BEV to improve accuracy and efficiency in autonomous driving scenarios.

Contribution

The paper proposes a novel high-resolution vector representation and two modules, vector scattering and gathering, to enhance 3D object detection performance.

Findings

01

Achieves state-of-the-art results on nuScenes dataset

02

Demonstrates improved inference speed and accuracy

03

Shows consistent performance gains with query-BEV methods

Abstract

The Bird's-Eye-View (BEV) representation is a critical factor that directly impacts the 3D object detection performance, but the traditional BEV grid representation induces quadratic computational cost as the spatial resolution grows. To address this limitation, we present a new camera-based 3D object detector with high-resolution vector representation: VectorFormer. The presented high-resolution vector representation is combined with the lower-resolution BEV representation to efficiently exploit 3D geometry from multi-camera images at a high resolution through our two novel modules: vector scattering and gathering. To this end, the learned vector representation with richer scene contexts can serve as the decoding query for final predictions. We conduct extensive experiments on the nuScenes dataset and demonstrate state-of-the-art performance in NDS and inference time. Furthermore, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zlichen/vectorformer
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndustrial Vision Systems and Defect Detection · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques