TL;DR
Matrix-Game is a large-scale, controllable, interactive world generation model for Minecraft, trained with a two-stage pipeline and evaluated using a new comprehensive benchmark, outperforming previous models in quality and controllability.
Contribution
The paper introduces Matrix-Game, a novel interactive world foundation model with a large dataset and a new benchmark, advancing controllability and physical understanding in game world generation.
Findings
Matrix-Game outperforms prior models in all benchmark metrics.
It achieves high visual quality and temporal coherence.
Human evaluations favor Matrix-Game's realism and controllability.
Abstract
We introduce Matrix-Game, an interactive world foundation model for controllable game world generation. Matrix-Game is trained using a two-stage pipeline that first performs large-scale unlabeled pretraining for environment understanding, followed by action-labeled training for interactive video generation. To support this, we curate Matrix-Game-MC, a comprehensive Minecraft dataset comprising over 2,700 hours of unlabeled gameplay video clips and over 1,000 hours of high-quality labeled clips with fine-grained keyboard and mouse action annotations. Our model adopts a controllable image-to-world generation paradigm, conditioned on a reference image, motion context, and user actions. With over 17 billion parameters, Matrix-Game enables precise control over character actions and camera movements, while maintaining high visual quality and temporal coherence. To evaluate performance, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsOASIS
