OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving
Lening Wang, Wenzhao Zheng, Yilong Ren, Han Jiang, Zhiyong Cui,, Haiyang Yu, Jiwen Lu

TL;DR
OccSora is a diffusion-based 4D occupancy model that simulates 3D scene evolution over time, providing a new way to generate long-term, consistent driving scene videos for autonomous vehicle decision-making.
Contribution
It introduces a novel diffusion transformer framework with a 4D scene tokenizer for efficient, high-quality long-sequence occupancy video generation conditioned on trajectories.
Findings
Generates 16-second 4D occupancy videos with high spatial and temporal fidelity.
Outperforms autoregressive models in long-term scene simulation.
Demonstrates potential as a world simulator for autonomous driving decision-making.
Abstract
Understanding the evolution of 3D scenes is important for effective autonomous driving. While conventional methods mode scene development with the motion of individual instances, world models emerge as a generative framework to describe the general scene dynamics. However, most existing methods adopt an autoregressive framework to perform next-token prediction, which suffer from inefficiency in modeling long-term temporal evolutions. To address this, we propose a diffusion-based 4D occupancy generation model, OccSora, to simulate the development of the 3D world for autonomous driving. We employ a 4D scene tokenizer to obtain compact discrete spatial-temporal representations for 4D occupancy input and achieve high-quality reconstruction for long-sequence occupancy videos. We then learn a diffusion transformer on the spatial-temporal representations and generate 4D occupancy conditioned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTransportation and Mobility Innovations · Vehicular Ad Hoc Networks (VANETs) · Traffic Prediction and Management Techniques
MethodsDiffusion
