BEVerse: Unified Perception and Prediction in Birds-Eye-View for   Vision-Centric Autonomous Driving

Yunpeng Zhang; Zheng Zhu; Wenzhao Zheng; Junjie Huang; Guan Huang; Jie; Zhou; Jiwen Lu

arXiv:2205.09743·cs.CV·May 20, 2022·83 cites

BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving

Yunpeng Zhang, Zheng Zhu, Wenzhao Zheng, Junjie Huang, Guan Huang, Jie, Zhou, Jiwen Lu

PDF

Open Access 1 Repo

TL;DR

BEVerse is a unified multi-task framework that leverages spatio-temporal BEV representations from multi-camera videos to improve perception and prediction in autonomous driving, outperforming single-task methods.

Contribution

The paper introduces BEVerse, a novel unified framework that jointly performs perception and prediction using multi-camera BEV representations with innovative modules like grid sampler and iterative flow.

Findings

01

Outperforms existing single-task methods on nuScenes

02

Improves 3D object detection and semantic map construction

03

Enhances motion prediction accuracy

Abstract

In this paper, we present BEVerse, a unified framework for 3D perception and prediction based on multi-camera systems. Unlike existing studies focusing on the improvement of single-task approaches, BEVerse features in producing spatio-temporal Birds-Eye-View (BEV) representations from multi-camera videos and jointly reasoning about multiple tasks for vision-centric autonomous driving. Specifically, BEVerse first performs shared feature extraction and lifting to generate 4D BEV representations from multi-timestamp and multi-view images. After the ego-motion alignment, the spatio-temporal encoder is utilized for further feature extraction in BEV. Finally, multiple task decoders are attached for joint reasoning and prediction. Within the decoders, we propose the grid sampler to generate BEV features with different ranges and granularities for different tasks. Also, we design the method of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhangyp15/beverse
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Video Surveillance and Tracking Methods