TL;DR
MultiWorld is a scalable multi-agent multi-view video world model that improves control, consistency, and efficiency in complex environments, outperforming baselines in fidelity and multi-view synthesis.
Contribution
It introduces the Multi-Agent Condition Module and Global State Encoder for enhanced multi-agent control and multi-view coherence, supporting flexible scaling and parallel view synthesis.
Findings
Outperforms baselines in video fidelity and multi-view consistency.
Supports scalable multi-agent and multi-view environments.
Demonstrates effectiveness in multi-player games and multi-robot tasks.
Abstract
Video world models have achieved remarkable success in simulating environmental dynamics in response to actions by users or agents. They are modeled as action-conditioned video generation models that take historical frames and current actions as input to predict future frames. Yet, most existing approaches are limited to single-agent scenarios and fail to capture the complex interactions inherent in real-world multi-agent systems. We present \textbf{MultiWorld}, a unified framework for multi-agent multi-view world modeling that enables accurate control of multiple agents while maintaining multi-view consistency. We introduce the Multi-Agent Condition Module to achieve precise multi-agent controllability, and the Global State Encoder to ensure coherent observations across different views. MultiWorld supports flexible scaling of agent and view counts, and synthesizes different views in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
