Power Mean Estimation in Stochastic Monte-Carlo Tree_Search

Tuan Dam; Odalric-Ambrym Maillard; Emilie Kaufmann

arXiv:2406.02235·cs.AI·June 5, 2024

Power Mean Estimation in Stochastic Monte-Carlo Tree_Search

Tuan Dam, Odalric-Ambrym Maillard, Emilie Kaufmann

PDF

Open Access

TL;DR

This paper introduces Stochastic-Power-UCT, an MCTS algorithm using the power mean estimator for stochastic environments, providing theoretical convergence guarantees and empirical validation.

Contribution

It develops a new MCTS algorithm with the power mean estimator tailored for stochastic MDPs and proves its polynomial convergence rate.

Findings

01

Shares the same convergence rate of O(n^{-1/2}) as Fixed-Depth-MCTS

02

Theoretical analysis confirms polynomial convergence in stochastic MDPs

03

Empirical tests validate the theoretical results across various environments

Abstract

Monte-Carlo Tree Search (MCTS) is a widely-used strategy for online planning that combines Monte-Carlo sampling with forward tree search. Its success relies on the Upper Confidence bound for Trees (UCT) algorithm, an extension of the UCB method for multi-arm bandits. However, the theoretical foundation of UCT is incomplete due to an error in the logarithmic bonus term for action selection, leading to the development of Fixed-Depth-MCTS with a polynomial exploration bonus to balance exploration and exploitation~\citep{shah2022journal}. Both UCT and Fixed-Depth-MCTS suffer from biased value estimation: the weighted sum underestimates the optimal value, while the maximum valuation overestimates it~\citep{coulom2006efficient}. The power mean estimator offers a balanced solution, lying between the average and maximum values. Power-UCT~\citep{dam2019generalized} incorporates this estimator…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms