The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control
Ruili Feng, Han Zhang, Zhantao Yang, Jie Xiao, Zhilei Shu, Zhiheng, Liu, Andy Zheng, Yukun Huang, Yu Liu, Hongyang Zhang

TL;DR
The paper introduces The Matrix, a high-fidelity real-time world simulator capable of generating continuous, immersive video streams from diverse environments, trained on both game and real-world data, enabling realistic exploration and zero-shot generalization.
Contribution
It presents the first foundational world simulator that combines high-resolution, real-time control with training on limited supervised game data and large-scale unsupervised real-world footage.
Findings
Supports 16 FPS real-time interactivity
Generates uncut hour-long sequences in diverse environments
Demonstrates zero-shot generalization to real-world scenarios
Abstract
We present The Matrix, the first foundational realistic world simulator capable of generating continuous 720p high-fidelity real-scene video streams with real-time, responsive control in both first- and third-person perspectives, enabling immersive exploration of richly dynamic environments. Trained on limited supervised data from AAA games like Forza Horizon 5 and Cyberpunk 2077, complemented by large-scale unsupervised footage from real-world settings like Tokyo streets, The Matrix allows users to traverse diverse terrains -- deserts, grasslands, water bodies, and urban landscapes -- in continuous, uncut hour-long sequences. Operating at 16 FPS, the system supports real-time interactivity and demonstrates zero-shot generalization, translating virtual game environments to real-world contexts where collecting continuous movement data is often infeasible. For example, The Matrix can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHuman Motion and Animation
