Manipulating the Distributions of Experience used for Self-Play Learning   in Expert Iteration

Dennis J. N. J. Soemers; \'Eric Piette; Matthew Stephenson; Cameron; Browne

arXiv:2006.00283·cs.LG·June 2, 2020

Manipulating the Distributions of Experience used for Self-Play Learning in Expert Iteration

Dennis J. N. J. Soemers, \'Eric Piette, Matthew Stephenson, Cameron, Browne

PDF

Open Access 1 Repo

TL;DR

This paper explores three methods to manipulate experience data in Expert Iteration self-play learning, aiming to improve training efficiency and performance across various board games.

Contribution

It introduces and evaluates three novel data manipulation techniques within the Expert Iteration framework to enhance self-play learning.

Findings

01

Major early training improvements in some games

02

Minor average improvements across fourteen games

03

Effective data manipulation strategies can boost self-play learning

Abstract

Expert Iteration (ExIt) is an effective framework for learning game-playing policies from self-play. ExIt involves training a policy to mimic the search behaviour of a tree search algorithm - such as Monte-Carlo tree search - and using the trained policy to guide it. The policy and the tree search can then iteratively improve each other, through experience gathered in self-play between instances of the guided tree search algorithm. This paper outlines three different approaches for manipulating the distribution of data collected from self-play, and the procedure that samples batches for learning updates from the collected data. Firstly, samples in batches are weighted based on the durations of the episodes in which they were originally experienced. Secondly, Prioritized Experience Replay is applied within the ExIt framework, to prioritise sampling experience from which we expect to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Ludeme/LudiiAI
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games · Sports Analytics and Performance · Educational Games and Gamification

MethodsPrioritized Experience Replay · Monte-Carlo Tree Search · Experience Replay