Monte Carlo Tree Search with Boltzmann Exploration
Michael Painter, Mohamed Baioumy, Nick Hawes, Bruno Lacerda

TL;DR
This paper introduces Boltzmann Tree Search (BTS) and Decaying ENtropy Tree-Search (DENTS), novel algorithms that improve exploration in Monte Carlo Tree Search by incorporating Boltzmann policies, addressing limitations of previous entropy-based methods.
Contribution
The paper presents two new algorithms, BTS and DENTS, that enhance exploration in MCTS using Boltzmann policies while overcoming limitations of existing maximum entropy approaches.
Findings
BTS and DENTS outperform existing methods in benchmark tests.
Algorithms enable faster action sampling via the Alias method.
Consistent high performance demonstrated in Go and other domains.
Abstract
Monte-Carlo Tree Search (MCTS) methods, such as Upper Confidence Bound applied to Trees (UCT), are instrumental to automated planning techniques. However, UCT can be slow to explore an optimal action when it initially appears inferior to other actions. Maximum ENtropy Tree-Search (MENTS) incorporates the maximum entropy principle into an MCTS approach, utilising Boltzmann policies to sample actions, naturally encouraging more exploration. In this paper, we highlight a major limitation of MENTS: optimal actions for the maximum entropy objective do not necessarily correspond to optimal actions for the original objective. We introduce two algorithms, Boltzmann Tree Search (BTS) and Decaying ENtropy Tree-Search (DENTS), that address these limitations and preserve the benefits of Boltzmann policies, such as allowing actions to be sampled faster by using the Alias method. Our empirical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsArtificial Intelligence in Games · Sports Analytics and Performance · Reinforcement Learning in Robotics
