SAGE: Semantic-Aware Gray-Box Game Regression Testing with Large Language Models

Jinyu Cai; Jialong Li; Nianyu Li; Zhenyu Mao; Mingyue Zhang; Kenji Tei

arXiv:2512.00560·cs.SE·December 2, 2025

SAGE: Semantic-Aware Gray-Box Game Regression Testing with Large Language Models

Jinyu Cai, Jialong Li, Nianyu Li, Zhenyu Mao, Mingyue Zhang, Kenji Tei

PDF

Open Access

TL;DR

SAGE introduces a semantic-aware gray-box regression testing framework for games that leverages large language models to automate test generation, optimize test suites, and prioritize relevant tests, reducing costs and improving bug detection.

Contribution

The paper presents SAGE, a novel framework using LLM-guided reinforcement learning and semantic analysis for efficient, automated regression testing in gray-box game environments, addressing key manual and redundancy challenges.

Findings

01

SAGE outperforms baseline methods in bug detection across tested environments.

02

SAGE reduces testing costs significantly compared to traditional approaches.

03

SAGE adapts effectively to version updates, maintaining high testing relevance.

Abstract

The rapid iteration cycles of modern live-service games make regression testing indispensable for maintaining quality and stability. However, existing regression testing approaches face critical limitations, especially in common gray-box settings where full source code access is unavailable: they heavily rely on manual effort for test case construction, struggle to maintain growing suites plagued by redundancy, and lack efficient mechanisms for prioritizing relevant tests. These challenges result in excessive testing costs, limited automation, and insufficient bug detection. To address these issues, we propose SAGE, a semanticaware regression testing framework for gray-box game environments. SAGE systematically addresses the core challenges of test generation, maintenance, and selection. It employs LLM-guided reinforcement learning for efficient, goal-oriented exploration to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Artificial Intelligence in Games