Best Agent Identification for General Game Playing
Matthew Stephenson, Alex Newcombe, Eric Piette, Dennis Soemers

TL;DR
This paper introduces a bandit-based method for efficiently identifying the best algorithms for each task in multi-problem domains, improving agent evaluation accuracy in general game playing.
Contribution
It proposes an optimistic selection process for best arm identification in multi-armed bandits, tailored for general game playing, with significant performance improvements over previous methods.
Findings
Substantial reduction in average simple regret.
Improved probability of error in agent selection.
Effective in GVGAI and Ludii general game systems.
Abstract
We present an efficient and generalised procedure to accurately identify the best (or near best) performing algorithm for each sub-task in a multi-problem domain. Our approach treats this as a set of best arm identification problems for multi-armed bandits, where each bandit corresponds to a specific task and each arm corresponds to a specific algorithm or agent. We propose an optimistic selection process based on a chosen confidence interval, that ranks each arm across all bandits in terms of their potential to influence our overall simple regret. We evaluate the performance of our approach on two of the most popular general game playing domains, the General Video Game AI (GVGAI) framework and the Ludii general game playing system, with the goal of selecting a high-performing agent for each game using a limited number of available trials. Compared to previous best arm identification…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
