Towards Understanding the Effects of Evolving the MCTS UCT Selection Policy
Fred Valdez Ameneyro, Edgar Galvan

TL;DR
This paper investigates how evolving the UCT selection policy in Monte Carlo Tree Search affects performance across different types of functions, revealing scenarios where evolved policies outperform standard UCT.
Contribution
It provides an in-depth analysis of evolved UCT policies across diverse problem types, highlighting when evolution improves MCTS performance.
Findings
Evolved UCT policies outperform standard UCT in multimodal and deceptive functions.
Standard UCT remains robust in unimodal functions.
Evolved policies are competitive in other scenarios.
Abstract
Monte Carlo Tree Search (MCTS) is a sampling best-first method to search for optimal decisions. The success of MCTS depends heavily on how the MCTS statistical tree is built and the selection policy plays a fundamental role in this. A particular selection policy that works particularly well, widely adopted in MCTS, is the Upper Confidence Bounds for Trees, referred to as UCT. Other more sophisticated bounds have been proposed by the community with the goal to improve MCTS performance on particular problems. Thus, it is evident that while the MCTS UCT behaves generally well, some variants might behave better. As a result of this, multiple works have been proposed to evolve a selection policy to be used in MCTS. Although all these works are inspiring, none of them have carried out an in-depth analysis shedding light under what circumstances an evolved alternative of MCTS UCT might be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsNone
