GameGen-Verifier: Parallel Keypoint-Based Verification for LLM-Generated Games via Runtime State Injection

Chaobo Jia; Ruipeng Wan; Ting Sun; Weihao Tan; Borui Wan; Yuxuan Tong; Guangming Sheng; Hong Xu

arXiv:2605.07442·cs.LG·May 11, 2026

GameGen-Verifier: Parallel Keypoint-Based Verification for LLM-Generated Games via Runtime State Injection

Chaobo Jia, Ruipeng Wan, Ting Sun, Weihao Tan, Borui Wan, Yuxuan Tong, Guangming Sheng, Hong Xu

PDF

TL;DR

GameGen-Verifier introduces a scalable, automated verification method for LLM-generated games by decomposing specifications into keypoints and independently verifying each through runtime state injection.

Contribution

It proposes a novel verification paradigm that decomposes game correctness into verifiable keypoints and implements a scalable harness for efficient, automated validation.

Findings

01

Achieves up to 92.2% accuracy against human judgments.

02

Reduces verification time by up to 16.6 times.

03

Outperforms existing Agent-as-a-Verifier baseline.

Abstract

LLM-based game generation promises to turn natural-language specifications into executable games, but progress is limited by the lack of reliable automated verification. Unlike conventional code generation, game correctness is defined over long-horizon interaction: a game may appear correct while violating core mechanics such as state updates, interaction rules, and phase transitions. Existing Agent-as-a-Verifier approaches collapse verification into open-ended gameplay, making verdicts reachability-bound, time-consuming, coverage-limited, and sensitive to the agent's gameplay ability. We present GameGen-Verifier, an automated verification paradigm for LLM-generated games that decomposes a specification into verifiable keypoints and grounds them into independent verification units. Each unit patches the game runtime into a concrete target state, executes a bounded interaction, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.