Hunyuan-GameCraft-2: Instruction-following Interactive Game World Model
Junshu Tang, Jiacheng Liu, Jiaqi Li, Longhuang Wu, Haoyu Yang, Penghao Zhao, Siruis Gong, Xiang Yuan, Shuai Shao, Linfeng Zhang, Qinglin Lu

TL;DR
Hunyuan-GameCraft-2 is a novel generative game world model that enables natural language and multimodal interactions, allowing users to control and generate dynamic, interactive game environments with high fidelity and responsiveness.
Contribution
It introduces an instruction-driven interaction paradigm for generative game modeling, transforming unstructured text-video data into causally aligned interactive datasets, and develops a large-scale MoE-based model with an interaction benchmark.
Findings
Generates temporally coherent, causally grounded interactive videos
Responds accurately to diverse free-form user instructions
Outperforms existing models in interaction fidelity and flexibility
Abstract
Recent advances in generative world models have enabled remarkable progress in creating open-ended game environments, evolving from static scene synthesis toward dynamic, interactive simulation. However, current approaches remain limited by rigid action schemas and high annotation costs, restricting their ability to model diverse in-game interactions and player-driven dynamics. To address these challenges, we introduce Hunyuan-GameCraft-2, a new paradigm of instruction-driven interaction for generative game world modeling. Instead of relying on fixed keyboard inputs, our model allows users to control game video contents through natural language prompts, keyboard, or mouse signals, enabling flexible and semantically rich interaction within generated worlds. We formally defined the concept of interactive video data and developed an automated process to transform large-scale, unstructured…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Multimodal Machine Learning Applications · Artificial Intelligence in Games
