How Far Can VLMs Go for Visual Bug Detection? Studying 19,738 Keyframes from 41 Hours of Gameplay Videos

Wentao Lu; Alexander Senchenko; Alan Sayle; Abram Hindle; Cor-Paul Bezemer

arXiv:2603.22706·cs.CV·March 25, 2026

How Far Can VLMs Go for Visual Bug Detection? Studying 19,738 Keyframes from 41 Hours of Gameplay Videos

Wentao Lu, Alexander Senchenko, Alan Sayle, Abram Hindle, Cor-Paul Bezemer

PDF

Open Access

TL;DR

This study evaluates the effectiveness of off-the-shelf vision language models in detecting visual bugs in long gameplay videos, revealing limited improvements with common enhancements and highlighting the need for hybrid approaches.

Contribution

It provides a real-world assessment of VLMs for visual bug detection in gameplay videos, showing their current capabilities and limitations without fine-tuning.

Findings

01

VLMs achieve 0.50 precision and 0.72 accuracy on bug detection.

02

Enhancement strategies offer marginal improvements with added computational costs.

03

Off-the-shelf VLMs can detect some visual bugs but need hybrid methods for better performance.

Abstract

Video-based quality assurance (QA) for long-form gameplay video is labor-intensive and error-prone, yet valuable for assessing game stability and visual correctness over extended play sessions. Vision language models (VLMs) promise general-purpose visual reasoning capabilities and thus appear attractive for detecting visual bugs directly from video frames. Recent benchmarks suggest that VLMs can achieve promising results in detecting visual glitches on curated datasets. Building on these findings, we conduct a real-world study using industrial QA gameplay videos to evaluate how well VLMs perform in practical scenarios. Our study samples keyframes from long gameplay videos and asks a VLM whether each keyframe contains a bug. Starting from a single-prompt baseline, the model achieves a precision of 0.50 and an accuracy of 0.72. We then examine two common enhancement strategies used to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Software Engineering Research · Multimodal Machine Learning Applications