Generalized Mean Estimation in Monte-Carlo Tree Search

Tuan Dam; Pascal Klink; Carlo D'Eramo; Jan Peters; Joni Pajarinen

arXiv:1911.00384·cs.AI·July 14, 2020

Generalized Mean Estimation in Monte-Carlo Tree Search

Tuan Dam, Pascal Klink, Carlo D'Eramo, Jan Peters, Joni Pajarinen

PDF

TL;DR

This paper introduces Power-UCT, a novel backup strategy for Monte-Carlo Tree Search that uses the power mean operator to improve convergence speed and accuracy in MDPs and POMDPs.

Contribution

The paper proposes Power-UCT, a new MCTS backup method using the power mean, with theoretical convergence guarantees and empirical performance improvements.

Findings

01

Power-UCT converges faster than traditional methods.

02

Empirical results show significant performance improvements.

03

Theoretical analysis confirms convergence to the optimal solution.

Abstract

We consider Monte-Carlo Tree Search (MCTS) applied to Markov Decision Processes (MDPs) and Partially Observable MDPs (POMDPs), and the well-known Upper Confidence bound for Trees (UCT) algorithm. In UCT, a tree with nodes (states) and edges (actions) is incrementally built by the expansion of nodes, and the values of nodes are updated through a backup strategy based on the average value of child nodes. However, it has been shown that with enough samples the maximum operator yields more accurate node value estimates than averaging. Instead of settling for one of these value estimates, we go a step further proposing a novel backup strategy which uses the power mean operator, which computes a value between the average and maximum value. We call our new approach Power-UCT, and argue how the use of the power mean operator helps to speed up the learning in MCTS. We theoretically analyze our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Monte-Carlo Tree Search