GameGen-X: Interactive Open-world Game Video Generation
Haoxuan Che, Xuanhua He, Quande Liu, Cheng Jin, Hao Chen

TL;DR
GameGen-X is a novel diffusion transformer model that enables high-quality, interactive open-world game video generation and control, leveraging a large dataset and specialized training for realistic gameplay simulation.
Contribution
The paper introduces GameGen-X, the first model for interactive open-world game video generation with a new dataset and a two-stage training process for controllability and quality.
Findings
First open-world game video dataset with 1 million clips from 150+ games.
Successful implementation of interactive controllability in game video generation.
High-quality, diverse game videos generated with user control capabilities.
Abstract
We introduce GameGen-X, the first diffusion transformer model specifically designed for both generating and interactively controlling open-world game videos. This model facilitates high-quality, open-domain generation by simulating an extensive array of game engine features, such as innovative characters, dynamic environments, complex actions, and diverse events. Additionally, it provides interactive controllability, predicting and altering future content based on the current clip, thus allowing for gameplay simulation. To realize this vision, we first collected and built an Open-World Video Game Dataset from scratch. It is the first and largest dataset for open-world game video generation and control, which comprises over a million diverse gameplay video clips sampling from over 150 games with informative captions from GPT-4o. GameGen-X undergoes a two-stage training process,…
Peer Reviews
Decision·ICLR 2025 Poster
The primary strength of the paper without question is the authors' new dataset. There is no dataset even close to this in terms of quality or size, it's a really exciting potential addition to this research area. This is primarily a strength in terms of originality, quality, and significance. I say primarily since the authors do not include access to the dataset at the review stage, though they do not have some metrics. The authors' GameGen-X and InstructNet are also strengths, but I have conc
The paper is relatively free of weaknesses in terms of originality, thanks in large part due to OGameData. However, the authors' work has some weaknesses in terms of the quality, clarity, and significance. This primarily comes down to (1) the authors' stated motivations and how this aligns with their work, (2) the way the authors overview their system, and (3) the experiments ### Motivations and the Dataset The authors motivate in two primary ways: (1) imagining this as a prototyping or early
1. A large well-curated dataset for open-world video games. The curation of the dataset contains filtering on different aspects (e.g., semantic alignment, motion), which results in a high-quality large-scale dataset. 2. The idea to build a video diffusion model for open-world video games is essentially interesting, and the results and demo videos are impressive. Besides, quantitatively, the proposed approach also achieves better performance than other SoTA diffusion models. 3. Detailed ablatio
1. One main concern is the proposed GameGen-X is specially fine-tuned/designed for open-world video games, while other diffusion models compared (e.g., kling) are trained for a general text-to-video generation, which makes the comparison somehow unfair to other models. 2. The qualitative examples in the website demo for game generation (e.g., under generation comparison) don't seem to look much better than other models (e.g., cogvideoX).
- The work is original in the sense that is the first main contribution to the field in terms of interactive video game generation in large scale, complex, open worlds - It is great to see such examples of tackling complex research environments at scale, with potential direct benefits to the game development process. - The author(s) introduce a complex system, both in terms of the dataset it required for training (including a resource intensive collection and curation process), as well as in
- There is one strong concern I have regarding the data collection process for the OGameData dataset. My score highly depends on evidence that data collection will pass the ethics review and there is evidence provided on the consent given by the humans that produced the data. There should be understanding and agreement for it to be used for research purposes and open sourced. Please elaborate on how the data for OGameData has been collected? In Appendix B.1. you mention selecting online video we
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games · Educational Games and Gamification · Human Motion and Animation
MethodsDiffusion
