RESP: Reference-guided Sequential Prompting for Visual Glitch Detection in Video Games
Yakun Yu, Ashley Wiens, Adri\'an Barahona-R\'ios, Benedict Wilkins, Saman Zadtootaghaj, Nabajeet Barman, Cor-Paul Bezemer

TL;DR
RESP introduces a reference-guided sequential prompting framework using vision-language models to improve video-level glitch detection in video games, addressing limitations of prior single-frame methods.
Contribution
The paper proposes a novel reference-guided prompting approach that enhances robustness in video glitch detection without fine-tuning VLMs, validated on synthetic and real datasets.
Findings
Reference guidance improves frame-level glitch detection across multiple VLMs.
The method achieves stronger video-level triage in realistic QA scenarios.
Experiments demonstrate consistent performance gains across datasets.
Abstract
Visual glitches in video games degrade player experience and perceived quality, yet manual quality assurance cannot scale to the growing test surface of modern game development. Prior automation efforts, particularly those using vision-language models (VLMs), largely operate on single frames or rely on limited video-level baselines that struggle under realistic scene variation, making robust video-level glitch detection challenging. We present RESP, a practical multi-frame framework for gameplay glitch detection with VLMs. Our key idea is reference-guided prompting: for each test frame, we select a reference frame from earlier in the same video, establishing a visual baseline and reframing detection as within-video comparison rather than isolated classification. RESP sequentially prompts the VLM with reference/test pairs and aggregates noisy frame predictions into a stable video-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
