Data-Augmented Game Starts for Accelerating Self-Play Exploration in Imperfect Information Games
JB Lanier, Nathan Monette, Pierre Baldi, Roy Fox

TL;DR
This paper introduces Data-Augmented Game Starts (DAGS), a method that uses offline human demonstration data to initialize reinforcement learning in complex imperfect-information games, significantly improving exploration and equilibrium approximation.
Contribution
The paper proposes DAGS, a novel initialization strategy for reinforcement learning in large-scale imperfect-information games, leveraging offline data to accelerate exploration and improve equilibrium quality.
Findings
DAGS achieves lower exploitability in synthetic and real games.
Augmenting starting states enhances exploration in challenging games.
Multi-task observation flags mitigate bias introduced by data augmentation.
Abstract
Finding approximate equilibria for large-scale imperfect-information competitive games such as StarCraft, Dota, and CounterStrike remains computationally infeasible due to sparse rewards and challenging exploration over long horizons. In this paper, we propose a multi-agent starting-state sampling strategy designed to substantially accelerate online exploration in regularized policy-gradient game methods for two-player zero-sum (2p0s) games. Motivated by an assumption that offline demonstrations from skilled humans can provide good coverage of high-level strategies relevant to equilibrium play, we propose the initialization of reinforcement learning data collection at intermediate states sampled from offline data to facilitate exploration of strategically relevant subgames. Referring to this method as Data-Augmented Game Starts (DAGS), we perform experiments using synthetic datasets and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
