Single-Agent Optimization Through Policy Iteration Using Monte-Carlo   Tree Search

Arta Seify; Michael Buro

arXiv:2005.11335·cs.LG·May 26, 2020·1 cites

Single-Agent Optimization Through Policy Iteration Using Monte-Carlo Tree Search

Arta Seify, Michael Buro

PDF

Open Access

TL;DR

This paper introduces a novel Monte-Carlo Tree Search variant with normalization, parallelization, and a learned policy network, improving single-agent optimization performance in the game SameGame.

Contribution

It presents a new MCTS-based algorithm with normalization, virtual loss for parallelization, and a self-trained policy network for single-agent optimization.

Findings

01

Outperforms baseline algorithms on various board sizes

02

Competitive with state-of-the-art search methods on benchmark positions

03

Effective in optimizing single-agent game scenarios

Abstract

The combination of Monte-Carlo Tree Search (MCTS) and deep reinforcement learning is state-of-the-art in two-player perfect-information games. In this paper, we describe a search algorithm that uses a variant of MCTS which we enhanced by 1) a novel action value normalization mechanism for games with potentially unbounded rewards (which is the case in many optimization problems), 2) defining a virtual loss function that enables effective search parallelization, and 3) a policy network, trained by generations of self-play, to guide the search. We gauge the effectiveness of our method in "SameGame"---a popular single-player test domain. Our experimental results indicate that our method outperforms baseline algorithms on several board sizes. Additionally, it is competitive with state-of-the-art search algorithms on a public set of positions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games · Reinforcement Learning in Robotics · Sports Analytics and Performance

MethodsMonte-Carlo Tree Search