Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline

Yangguang Li; Bin Huang; Zeren Chen; Yufeng Cui; Feng Liang; Mingzhu; Shen; Fenggang Liu; Enze Xie; Lu Sheng; Wanli Ouyang; Jing Shao

arXiv:2301.12511·cs.CV·July 10, 2024·6 cites

Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline

Yangguang Li, Bin Huang, Zeren Chen, Yufeng Cui, Feng Liang, Mingzhu, Shen, Fenggang Liu, Enze Xie, Lu Sheng, Wanli Ouyang, Jing Shao

PDF

Open Access 1 Repo

TL;DR

Fast-BEV introduces a lightweight, efficient BEV perception framework for autonomous vehicles that achieves high speed and competitive accuracy without relying on complex transformer-based transformations.

Contribution

The paper proposes a novel, resource-efficient BEV perception framework that eliminates the need for expensive transformations, enabling faster inference on on-vehicle chips.

Findings

01

R50 model runs at 52.6 FPS with 47.3% NDS on nuScenes.

02

Outperforms existing BEV methods in speed and accuracy.

03

Largest model achieves 53.5% NDS, demonstrating strong performance.

Abstract

Recently, perception task based on Bird's-Eye View (BEV) representation has drawn more and more attention, and BEV representation is promising as the foundation for next-generation Autonomous Vehicle (AV) perception. However, most existing BEV solutions either require considerable resources to execute on-vehicle inference or suffer from modest performance. This paper proposes a simple yet effective framework, termed Fast-BEV , which is capable of performing faster BEV perception on the on-vehicle chips. Towards this goal, we first empirically find that the BEV representation can be sufficiently powerful without expensive transformer based transformation nor depth representation. Our Fast-BEV consists of five parts, We novelly propose (1) a lightweight deployment-friendly view transformation which fast transfers 2D image feature to 3D voxel space, (2) an multi-scale image encoder which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sense-gvt/fast-bev
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Visual Attention and Saliency Detection

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings