Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms
Muyang He, Hanzhong Guo, Junxiong Lin, Yizhou Yu

TL;DR
This paper reviews efficient video generation methods for world modeling, introducing a taxonomy and emphasizing the importance of efficiency for real-time applications like autonomous driving and AI.
Contribution
It presents a new taxonomy of efficient video generation paradigms, architectures, and algorithms, highlighting their role in practical world simulation.
Findings
Bridging efficiency gaps enables real-time interactive applications.
Efficient modeling improves scalability for complex physical dynamics.
Emerging frontiers focus on real-time, robust world simulators.
Abstract
The rapid evolution of video generation has enabled models to simulate complex physical dynamics and long-horizon causalities, positioning them as potential world simulators. However, a critical gap still remains between the theoretical capacity for world simulation and the heavy computational costs of spatiotemporal modeling. To address this, we comprehensively and systematically review video generation frameworks and techniques that consider efficiency as a crucial requirement for practical world modeling. We introduce a novel taxonomy in three dimensions: efficient modeling paradigms, efficient network architectures, and efficient inference algorithms. We further show that bridging this efficiency gap directly empowers interactive applications such as autonomous driving, embodied AI, and game simulation. Finally, we identify emerging research frontiers in efficient video-based world…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
