An Efficient Dynamic Sampling Policy For Monte Carlo Tree Search

Gongbo Zhang; Yijie Peng; Yilong Xu

arXiv:2204.12043·cs.AI·May 9, 2023·1 cites

An Efficient Dynamic Sampling Policy For Monte Carlo Tree Search

Gongbo Zhang, Yijie Peng, Yilong Xu

PDF

Open Access

TL;DR

This paper introduces a dynamic sampling policy for Monte Carlo Tree Search that improves action selection efficiency in finite-horizon Markov decision processes, demonstrated through experiments on Tic-Tac-Toe and Gomoku.

Contribution

It proposes a novel dynamic sampling policy for MCTS that better allocates computational resources to identify the best action.

Findings

01

The new policy outperforms existing methods in efficiency.

02

Experimental results show improved action selection in Tic-Tac-Toe and Gomoku.

03

The approach enhances MCTS performance in finite-horizon MDPs.

Abstract

We consider the popular tree-based search strategy within the framework of reinforcement learning, the Monte Carlo Tree Search (MCTS), in the context of finite-horizon Markov decision process. We propose a dynamic sampling tree policy that efficiently allocates limited computational budget to maximize the probability of correct selection of the best action at the root node of the tree. Experimental results on Tic-Tac-Toe and Gomoku show that the proposed tree policy is more efficient than other competing methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Evolutionary Algorithms and Applications