InfinityDrive: Breaking Time Limits in Driving World Models
Xi Guo, Chenjing Ding, Haoxuan Dou, Xin Zhang, Weixuan Tang, Wei Wu

TL;DR
InfinityDrive is a novel driving world model that significantly extends temporal horizons and scenario diversity, enabling high-fidelity, coherent, and diverse video generation for autonomous driving applications.
Contribution
It introduces an efficient spatio-temporal co-modeling module and an extended training strategy, achieving over 1500 frames of consistent, high-resolution driving scene generation.
Findings
State-of-the-art performance in high fidelity and diversity
Generated videos exceeding 1500 frames with coherence
Validated across multiple datasets
Abstract
Autonomous driving systems struggle with complex scenarios due to limited access to diverse, extensive, and out-of-distribution driving data which are critical for safe navigation. World models offer a promising solution to this challenge; however, current driving world models are constrained by short time windows and limited scenario diversity. To bridge this gap, we introduce InfinityDrive, the first driving world model with exceptional generalization capabilities, delivering state-of-the-art performance in high fidelity, consistency, and diversity with minute-scale video generation. InfinityDrive introduces an efficient spatio-temporal co-modeling module paired with an extended temporal training strategy, enabling high-resolution (5761024) video generation with consistent spatial and temporal coherence. By incorporating memory injection and retention mechanisms alongside an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics
