Hunyuan-GameCraft-2: Instruction-following Interactive Game World Model

Junshu Tang; Jiacheng Liu; Jiaqi Li; Longhuang Wu; Haoyu Yang; Penghao Zhao; Siruis Gong; Xiang Yuan; Shuai Shao; Linfeng Zhang; Qinglin Lu

arXiv:2511.23429·cs.CV·February 11, 2026

Hunyuan-GameCraft-2: Instruction-following Interactive Game World Model

Junshu Tang, Jiacheng Liu, Jiaqi Li, Longhuang Wu, Haoyu Yang, Penghao Zhao, Siruis Gong, Xiang Yuan, Shuai Shao, Linfeng Zhang, Qinglin Lu

PDF

Open Access

TL;DR

Hunyuan-GameCraft-2 is a novel generative game world model that enables natural language and multimodal interactions, allowing users to control and generate dynamic, interactive game environments with high fidelity and responsiveness.

Contribution

It introduces an instruction-driven interaction paradigm for generative game modeling, transforming unstructured text-video data into causally aligned interactive datasets, and develops a large-scale MoE-based model with an interaction benchmark.

Findings

01

Generates temporally coherent, causally grounded interactive videos

02

Responds accurately to diverse free-form user instructions

03

Outperforms existing models in interaction fidelity and flexibility

Abstract

Recent advances in generative world models have enabled remarkable progress in creating open-ended game environments, evolving from static scene synthesis toward dynamic, interactive simulation. However, current approaches remain limited by rigid action schemas and high annotation costs, restricting their ability to model diverse in-game interactions and player-driven dynamics. To address these challenges, we introduce Hunyuan-GameCraft-2, a new paradigm of instruction-driven interaction for generative game world modeling. Instead of relying on fixed keyboard inputs, our model allows users to control game video contents through natural language prompts, keyboard, or mouse signals, enabling flexible and semantically rich interaction within generated worlds. We formally defined the concept of interactive video data and developed an automated process to transform large-scale, unstructured…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Multimodal Machine Learning Applications · Artificial Intelligence in Games