Monte-Carlo Tree Search as Regularized Policy Optimization

Jean-Bastien Grill; Florent Altch\'e; Yunhao Tang; Thomas Hubert,; Michal Valko; Ioannis Antonoglou; R\'emi Munos

arXiv:2007.12509·cs.LG·July 27, 2020·34 cites

Monte-Carlo Tree Search as Regularized Policy Optimization

Jean-Bastien Grill, Florent Altch\'e, Yunhao Tang, Thomas Hubert,, Michal Valko, Ioannis Antonoglou, R\'emi Munos

PDF

Open Access 3 Repos 1 Datasets 1 Video

TL;DR

This paper reveals that AlphaZero's heuristics are approximations to a regularized policy optimization problem and introduces a variant that improves performance by solving this problem exactly.

Contribution

It provides a theoretical understanding of AlphaZero's heuristics and proposes an improved algorithm based on exact solutions to the regularized policy optimization problem.

Findings

01

The proposed variant outperforms AlphaZero in multiple domains.

02

AlphaZero's heuristics approximate a regularized policy optimization solution.

03

The new method offers more reliable and improved performance.

Abstract

The combination of Monte-Carlo tree search (MCTS) with deep reinforcement learning has led to significant advances in artificial intelligence. However, AlphaZero, the current state-of-the-art MCTS algorithm, still relies on handcrafted heuristics that are only partially understood. In this paper, we show that AlphaZero's search heuristics, along with other common ones such as UCT, are an approximation to the solution of a specific regularized policy optimization problem. With this insight, we propose a variant of AlphaZero which uses the exact solution to this policy optimization problem, and show experimentally that it reliably outperforms the original algorithm in multiple domains.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

misovalko/my-research-papers
dataset· 21 dl
21 dl

Videos

Monte-Carlo Tree Search as Regularized Policy Optimization· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Machine Learning and Data Classification

MethodsAlphaZero · Monte-Carlo Tree Search