GenieDrive: Towards Physics-Aware Driving World Model with 4D Occupancy Guided Video Generation

Zhenya Yang; Zhe Liu; Yuxiang Lu; Liping Hou; Chenxuan Miao; Siyi Peng; Bailan Feng; Xiang Bai; Hengshuang Zhao

arXiv:2512.12751·cs.CV·December 16, 2025

GenieDrive: Towards Physics-Aware Driving World Model with 4D Occupancy Guided Video Generation

Zhenya Yang, Zhe Liu, Yuxiang Lu, Liping Hou, Chenxuan Miao, Siyi Peng, Bailan Feng, Xiang Bai, Hengshuang Zhao

PDF

Open Access

TL;DR

GenieDrive introduces a physics-informed framework for driving video generation that leverages 4D occupancy, a specialized VAE, and attention mechanisms to produce controllable, multi-view, and physically consistent driving videos efficiently.

Contribution

The paper presents a novel physics-aware driving video generation framework using 4D occupancy, a compact VAE, and attention modules, achieving improved accuracy and efficiency over prior methods.

Findings

01

7.2% improvement in forecasting mIoU

02

20.7% reduction in FVD for video quality

03

41 FPS inference speed with 3.47 M parameters

Abstract

Physics-aware driving world model is essential for drive planning, out-of-distribution data synthesis, and closed-loop evaluation. However, existing methods often rely on a single diffusion model to directly map driving actions to videos, which makes learning difficult and leads to physically inconsistent outputs. To overcome these challenges, we propose GenieDrive, a novel framework designed for physics-aware driving video generation. Our approach starts by generating 4D occupancy, which serves as a physics-informed foundation for subsequent video generation. 4D occupancy contains rich physical information, including high-resolution 3D structures and dynamics. To facilitate effective compression of such high-resolution occupancy, we propose a VAE that encodes occupancy into a latent tri-plane representation, reducing the latent size to only 58% of that used in previous methods. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Motion and Animation · Model Reduction and Neural Networks