BEVWorld: A Multimodal World Simulator for Autonomous Driving via   Scene-Level BEV Latents

Yumeng Zhang; Shi Gong; Kaixin Xiong; Xiaoqing Ye; Xiaofan Li; Xiao; Tan; Fan Wang; Jizhou Huang; Hua Wu; Haifeng Wang

arXiv:2407.05679·cs.CV·May 1, 2025·2 cites

BEVWorld: A Multimodal World Simulator for Autonomous Driving via Scene-Level BEV Latents

Yumeng Zhang, Shi Gong, Kaixin Xiong, Xiaoqing Ye, Xiaofan Li, Xiao, Tan, Fan Wang, Jizhou Huang, Hua Wu, Haifeng Wang

PDF

Open Access 1 Repo

TL;DR

BEVWorld introduces a unified multimodal BEV latent space for holistic environment modeling in autonomous driving, enabling realistic future scene generation and improved downstream task performance.

Contribution

It presents a novel framework combining a multi-modal tokenizer and a BEV sequence diffusion model for joint scene encoding and future forecasting.

Findings

01

Effective in generating realistic future scenes

02

Improves perception and motion prediction tasks

03

Demonstrates strong performance on autonomous driving benchmarks

Abstract

World models have attracted increasing attention in autonomous driving for their ability to forecast potential future scenarios. In this paper, we propose BEVWorld, a novel framework that transforms multimodal sensor inputs into a unified and compact Bird's Eye View (BEV) latent space for holistic environment modeling. The proposed world model consists of two main components: a multi-modal tokenizer and a latent BEV sequence diffusion model. The multi-modal tokenizer first encodes heterogeneous sensory data, and its decoder reconstructs the latent BEV tokens into LiDAR and surround-view image observations via ray-casting rendering in a self-supervised manner. This enables joint modeling and bidirectional encoding-decoding of panoramic imagery and point cloud data within a shared spatial representation. On top of this, the latent BEV sequence diffusion model performs temporally…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zympsyche/bevworld
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Topic Modeling · Big Data Technologies and Applications

MethodsSoftmax · Attention Is All You Need · Diffusion