MagicWorld: Towards Long-Horizon Stability for Interactive Video World Exploration
Guangyuan Li, Bo Li, Jinwei Chen, Xiaobin Hu, Lei Zhao, Peng-Tao Jiang

TL;DR
MagicWorld introduces a novel autoregressive interactive video model that enhances long-horizon stability and motion realism by incorporating flow-guided constraints and strategic training methods, supported by a new real-world dataset.
Contribution
It proposes innovative techniques for reducing motion drift and error accumulation in long-term interactive video generation, advancing the state-of-the-art in dynamic scene modeling.
Findings
Improves motion realism in complex dynamic environments
Reduces error accumulation over long interactions
Demonstrates superior performance on RealWM120K dataset
Abstract
Recent interactive video world model methods generate scene evolution conditioned on user instructions. Although they achieve impressive results, two key limitations remain. First, they exhibit motion drift in complex environments with multiple interacting subjects, where dynamic subjects fail to follow realistic motion patterns during scene evolution. Second, they suffer from error accumulation in long-horizon interactions, where autoregressive generation gradually drifts from earlier scene states and causes structural and semantic inconsistencies. In this paper, we propose MagicWorld, an interactive video world model built upon an autoregressive framework. To address motion drift, we incorporate a flow-guided motion preservation constraint that mitigates motion degradation in dynamic subjects, encouraging realistic motion patterns and stable interactions during scene evolution. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Advanced Vision and Imaging · Human Motion and Animation
