An Efficient Dynamic Sampling Policy For Monte Carlo Tree Search
Gongbo Zhang, Yijie Peng, Yilong Xu

TL;DR
This paper introduces a dynamic sampling policy for Monte Carlo Tree Search that improves action selection efficiency in finite-horizon Markov decision processes, demonstrated through experiments on Tic-Tac-Toe and Gomoku.
Contribution
It proposes a novel dynamic sampling policy for MCTS that better allocates computational resources to identify the best action.
Findings
The new policy outperforms existing methods in efficiency.
Experimental results show improved action selection in Tic-Tac-Toe and Gomoku.
The approach enhances MCTS performance in finite-horizon MDPs.
Abstract
We consider the popular tree-based search strategy within the framework of reinforcement learning, the Monte Carlo Tree Search (MCTS), in the context of finite-horizon Markov decision process. We propose a dynamic sampling tree policy that efficiently allocates limited computational budget to maximize the probability of correct selection of the best action at the root node of the tree. Experimental results on Tic-Tac-Toe and Gomoku show that the proposed tree policy is more efficient than other competing methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Evolutionary Algorithms and Applications
