TL;DR
MultiGen introduces an external memory system into diffusion game engines, enabling real-time multiplayer interaction and user-controlled, editable environments in interactive video worlds.
Contribution
It presents a novel architecture with Memory, Observation, and Dynamics modules that allows persistent, editable environments and multiplayer interactions in diffusion-based world models.
Findings
Enables real-time multiplayer world editing and interaction.
Provides persistent environment states independent of model context.
Supports coherent multi-user experiences in diffusion game engines.
Abstract
Video world models have shown immense promise for interactive simulation and entertainment, but current systems still struggle with two important aspects of interactivity: user control over the environment for reproducible, editable experiences, and shared inference where players hold influence over a common world. To address these limitations, we introduce an explicit external memory into the system, a persistent state operating independent of the model's context window, that is continually updated by user actions and queried throughout the generation roll-out. Unlike conventional diffusion game engines that operate as next-frame predictors, our approach decomposes generation into Memory, Observation, and Dynamics modules. This design gives users direct, editable control over environment structure via an editable memory representation, and it naturally extends to real-time multiplayer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
