Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms

Muyang He; Hanzhong Guo; Junxiong Lin; Yizhou Yu

arXiv:2603.28489·eess.IV·May 6, 2026

Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms

Muyang He, Hanzhong Guo, Junxiong Lin, Yizhou Yu

PDF

TL;DR

This paper reviews efficient video generation methods for world modeling, introducing a taxonomy and emphasizing the importance of efficiency for real-time applications like autonomous driving and AI.

Contribution

It presents a new taxonomy of efficient video generation paradigms, architectures, and algorithms, highlighting their role in practical world simulation.

Findings

01

Bridging efficiency gaps enables real-time interactive applications.

02

Efficient modeling improves scalability for complex physical dynamics.

03

Emerging frontiers focus on real-time, robust world simulators.

Abstract

The rapid evolution of video generation has enabled models to simulate complex physical dynamics and long-horizon causalities, positioning them as potential world simulators. However, a critical gap still remains between the theoretical capacity for world simulation and the heavy computational costs of spatiotemporal modeling. To address this, we comprehensively and systematically review video generation frameworks and techniques that consider efficiency as a crucial requirement for practical world modeling. We introduce a novel taxonomy in three dimensions: efficient modeling paradigms, efficient network architectures, and efficient inference algorithms. We further show that bridging this efficiency gap directly empowers interactive applications such as autonomous driving, embodied AI, and game simulation. Finally, we identify emerging research frontiers in efficient video-based world…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.