From Virtual Games to Real-World Play
Wenqiang Sun, Fangyun Wei, Jinjing Zhao, Xi Chen, Zilong Chen, Hongyang Zhang, Jun Zhang, Yan Lu

TL;DR
RealPlay is a neural network-based system that generates photorealistic, temporally consistent videos from user controls, enabling interactive real-world video synthesis without real-world action annotations.
Contribution
It introduces a novel interactive video generation framework that generalizes control signals from virtual to real-world scenarios and across different entity types.
Findings
Effective control transfer from virtual to real-world scenarios.
Generalizes to diverse entities beyond training data.
Produces photorealistic, temporally consistent videos.
Abstract
We introduce RealPlay, a neural network-based real-world game engine that enables interactive video generation from user control signals. Unlike prior works focused on game-style visuals, RealPlay aims to produce photorealistic, temporally consistent video sequences that resemble real-world footage. It operates in an interactive loop: users observe a generated scene, issue a control command, and receive a short video chunk in response. To enable such realistic and responsive generation, we address key challenges including iterative chunk-wise prediction for low-latency feedback, temporal consistency across iterations, and accurate control response. RealPlay is trained on a combination of labeled game data and unlabeled real-world videos, without requiring real-world action annotations. Notably, we observe two forms of generalization: (1) control transfer-RealPlay effectively maps…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
