Automatically Reinforcing a Game AI
David L. St-Pierre, Jean-Baptiste Hoock, Jialin Liu, Fabien Teytaud, and Olivier Teytaud

TL;DR
This paper explores portfolio methods to enhance game-playing AI by decomposing a single GPP into multiple variants and training them offline or online, resulting in more robust and stronger game AI performance.
Contribution
It introduces two offline portfolio approaches, BestArm and Nash-portfolio, and an online bandit-based method to improve game AI robustness and strength.
Findings
Nash-portfolio is more robust against learning opponents.
Offline methods outperform the original GPP in certain scenarios.
Online bandit approach adapts effectively to game conditions.
Abstract
A recent research trend in Artificial Intelligence (AI) is the combination of several programs into one single, stronger, program; this is termed portfolio methods. We here investigate the application of such methods to Game Playing Programs (GPPs). In addition, we consider the case in which only one GPP is available - by decomposing this single GPP into several ones through the use of parameters or even simply random seeds. These portfolio methods are trained in a learning phase. We propose two different offline approaches. The simplest one, BestArm, is a straightforward optimization of seeds or parame- ters; it performs quite well against the original GPP, but performs poorly against an opponent which repeats games and learns. The second one, namely Nash-portfolio, performs similarly in a "one game" test, and is much more robust against an opponent who learns. We also propose an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Artificial Intelligence in Games · Reinforcement Learning in Robotics
