Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving
Yu Yang, Jianbiao Mei, Yukai Ma, Siliang Du, Wenqing Chen, Yijie Qian,, Yuxiang Feng, Yong Liu

TL;DR
This paper introduces Drive-OccWorld, a vision-centric 4D occupancy forecasting model that integrates semantic and motion information for end-to-end autonomous driving planning, enabling controllable and plausible future state generation.
Contribution
It proposes a novel 4D world model with semantic-motion normalization and flexible action conditioning for improved autonomous driving planning.
Findings
Accurately forecasts future occupancy and flow in 4D space.
Enables controllable generation with various action inputs.
Demonstrates superior performance on nuScenes and Lyft datasets.
Abstract
World models envision potential future states based on various ego actions. They embed extensive knowledge about the driving environment, facilitating safe and scalable autonomous driving. Most existing methods primarily focus on either data generation or the pretraining paradigms of world models. Unlike the aforementioned prior works, we propose Drive-OccWorld, which adapts a vision-centric 4D forecasting world model to end-to-end planning for autonomous driving. Specifically, we first introduce a semantic and motion-conditional normalization in the memory module, which accumulates semantic and dynamic information from historical BEV embeddings. These BEV features are then conveyed to the world decoder for future occupancy and flow forecasting, considering both geometry and spatiotemporal modeling. Additionally, we propose injecting flexible action conditions, such as velocity,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsData Management and Algorithms · Transportation and Mobility Innovations
MethodsFocus
