InfinityDrive: Breaking Time Limits in Driving World Models

Xi Guo; Chenjing Ding; Haoxuan Dou; Xin Zhang; Weixuan Tang; Wei Wu

arXiv:2412.01522·cs.CV·December 5, 2024

InfinityDrive: Breaking Time Limits in Driving World Models

Xi Guo, Chenjing Ding, Haoxuan Dou, Xin Zhang, Weixuan Tang, Wei Wu

PDF

Open Access

TL;DR

InfinityDrive is a novel driving world model that significantly extends temporal horizons and scenario diversity, enabling high-fidelity, coherent, and diverse video generation for autonomous driving applications.

Contribution

It introduces an efficient spatio-temporal co-modeling module and an extended training strategy, achieving over 1500 frames of consistent, high-resolution driving scene generation.

Findings

01

State-of-the-art performance in high fidelity and diversity

02

Generated videos exceeding 1500 frames with coherence

03

Validated across multiple datasets

Abstract

Autonomous driving systems struggle with complex scenarios due to limited access to diverse, extensive, and out-of-distribution driving data which are critical for safe navigation. World models offer a promising solution to this challenge; however, current driving world models are constrained by short time windows and limited scenario diversity. To bridge this gap, we introduce InfinityDrive, the first driving world model with exceptional generalization capabilities, delivering state-of-the-art performance in high fidelity, consistency, and diversity with minute-scale video generation. InfinityDrive introduces an efficient spatio-temporal co-modeling module paired with an extended temporal training strategy, enabling high-resolution (576 $\times$ 1024) video generation with consistent spatial and temporal coherence. By incorporating memory injection and retention mechanisms alongside an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Visualization and Analytics