Manipulating the Distributions of Experience used for Self-Play Learning in Expert Iteration
Dennis J. N. J. Soemers, \'Eric Piette, Matthew Stephenson, Cameron, Browne

TL;DR
This paper explores three methods to manipulate experience data in Expert Iteration self-play learning, aiming to improve training efficiency and performance across various board games.
Contribution
It introduces and evaluates three novel data manipulation techniques within the Expert Iteration framework to enhance self-play learning.
Findings
Major early training improvements in some games
Minor average improvements across fourteen games
Effective data manipulation strategies can boost self-play learning
Abstract
Expert Iteration (ExIt) is an effective framework for learning game-playing policies from self-play. ExIt involves training a policy to mimic the search behaviour of a tree search algorithm - such as Monte-Carlo tree search - and using the trained policy to guide it. The policy and the tree search can then iteratively improve each other, through experience gathered in self-play between instances of the guided tree search algorithm. This paper outlines three different approaches for manipulating the distribution of data collected from self-play, and the procedure that samples batches for learning updates from the collected data. Firstly, samples in batches are weighted based on the durations of the episodes in which they were originally experienced. Secondly, Prioritized Experience Replay is applied within the ExIt framework, to prioritise sampling experience from which we expect to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games · Sports Analytics and Performance · Educational Games and Gamification
MethodsPrioritized Experience Replay · Monte-Carlo Tree Search · Experience Replay
