Towards Game-Playing AI Benchmarks via Performance Reporting Standards
Vanessa Volz, Boris Naujoks

TL;DR
This paper proposes standardized reporting guidelines for AI game-playing performance to enable unbiased comparisons and facilitate the development of benchmarks and competitions.
Contribution
It introduces a novel framework for performance reporting standards in AI game-playing research, addressing the lack of comparability across studies.
Findings
Guidelines improve clarity and comparability of AI performance reports
Facilitates the creation of benchmarks and competitions
Supports more general conclusions about AI strengths and challenges
Abstract
While games have been used extensively as milestones to evaluate game-playing AI, there exists no standardised framework for reporting the obtained observations. As a result, it remains difficult to draw general conclusions about the strengths and weaknesses of different game-playing AI algorithms. In this paper, we propose reporting guidelines for AI game-playing performance that, if followed, provide information suitable for unbiased comparisons between different AI approaches. The vision we describe is to build benchmarks and competitions based on such guidelines in order to be able to draw more general conclusions about the behaviour of different AI algorithms, as well as the types of challenges different games pose.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
