VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance
Mohammad Reza Taesiri, Abhijay Ghildyal, Saman Zadtootaghaj, Nabajeet Barman, Cor-Paul Bezemer

TL;DR
VideoGameQA-Bench is a new standardized benchmark designed to evaluate vision-language models' effectiveness in automating and improving various video game quality assurance tasks, addressing a critical need in the gaming industry.
Contribution
The paper introduces VideoGameQA-Bench, the first comprehensive benchmark specifically for assessing vision-language models in video game quality assurance tasks.
Findings
Benchmark covers diverse QA activities including visual testing and bug detection.
Provides standardized evaluation metrics for VLMs in game QA.
Facilitates future research and development in automated game testing.
Abstract
With video games now generating the highest revenues in the entertainment industry, optimizing game development workflows has become essential for the sector's sustained growth. Recent advancements in Vision-Language Models (VLMs) offer considerable potential to automate and enhance various aspects of game development, particularly Quality Assurance (QA), which remains one of the industry's most labor-intensive processes with limited automation options. To accurately evaluate the performance of VLMs in video game QA tasks and determine their effectiveness in handling real-world scenarios, there is a clear need for standardized benchmarks, as existing benchmarks are insufficient to address the specific requirements of this domain. To bridge this gap, we introduce VideoGameQA-Bench, a comprehensive benchmark that covers a wide array of game QA activities, including visual unit testing,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsVideo Analysis and Summarization · Data Visualization and Analytics · Video Surveillance and Tracking Methods
