LiveWorld: Simulating Out-of-Sight Dynamics in Generative Video World Models
Zicheng Duan, Jiatong Xia, Zeyu Zhang, Wenbo Zhang, Gengze Zhou, Chenhui Gou, Yefei He, Feng Chen, Xinyu Zhang, Lingqiao Liu

TL;DR
LiveWorld introduces a framework that enables generative video world models to simulate continuous, out-of-sight world dynamics, overcoming the limitation of static memory when objects leave the observer's view.
Contribution
The paper proposes LiveWorld, a novel approach that models persistent global states and active entities to simulate ongoing unseen dynamics in video world models.
Findings
Enables persistent evolution of unseen scene elements.
Maintains long-term scene consistency.
Bridges the gap between 2D memory and 4D world simulation.
Abstract
Recent generative video world models aim to simulate visual environment evolution, allowing an observer to interactively explore the scene via camera control. However, they implicitly assume that the world only evolves within the observer's field of view. Once an object leaves the observer's view, its state is "frozen" in memory, and revisiting the same region later often fails to reflect events that should have occurred in the meantime. In this work, we identify and formalize this overlooked limitation as the "out-of-sight dynamics" problem, which impedes video world models from representing a continuously evolving world. To address this issue, we propose LiveWorld, a novel framework that extends video world models to support persistent world evolution. Instead of treating the world as static observational memory, LiveWorld models a persistent global state composed of a static 3D…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Advanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis
