TL;DR
This paper introduces a model-based approach where an oracle planner with full state access guides the training of an agent in complex imperfect-information games, enabling efficient strategy learning with limited data.
Contribution
It presents a novel method of using an oracle planner to distill strategies into a learning agent for large imperfect-information games, improving over model-free methods.
Findings
Planner with fixed-depth search and Thompson sampling outperforms naive Monte Carlo in large action spaces.
The follower policy learns effective strategies after training on a few hundred battles.
The approach successfully applies to complex games like Clash Royale and Pommerman.
Abstract
We consider learning to play multiplayer imperfect-information games with simultaneous moves and large state-action spaces. Previous attempts to tackle such challenging games have largely focused on model-free learning methods, often requiring hundreds of years of experience to produce competitive agents. Our approach is based on model-based planning. We tackle the problem of partial observability by first building an (oracle) planner that has access to the full state of the environment and then distilling the knowledge of the oracle to a (follower) agent which is trained to play the imperfect-information game by imitating the oracle's choices. We experimentally show that planning with naive Monte Carlo tree search does not perform very well in large combinatorial action spaces. We therefore propose planning with a fixed-depth tree search and decoupled Thompson sampling for action…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
