An Efficient Algorithm for Thresholding Monte Carlo Tree Search

Shoma Nameki (1); Atsuyoshi Nakamura (2); Junpei Komiyama (3; 4); Koji Tabata (5) ((1) Graduate School of Information Science; Technology; Hokkaido University; (2) Faculty of Information Science; Technology; Hokkaido University; (3) Mohamed bin Zayed University of Artificial Intelligence (MBZUAI); (4) RIKEN AIP; (5) Research Institute for Electronic Science; Hokkaido University)

arXiv:2601.22600·stat.ML·February 2, 2026

An Efficient Algorithm for Thresholding Monte Carlo Tree Search

Shoma Nameki (1), Atsuyoshi Nakamura (2), Junpei Komiyama (3, 4), Koji Tabata (5) ((1) Graduate School of Information Science, Technology, Hokkaido University, (2) Faculty of Information Science, Technology, Hokkaido University

PDF

Open Access

TL;DR

This paper presents a new efficient algorithm for the Thresholding Monte Carlo Tree Search problem, improving sample complexity and computational cost through a novel ratio-based modification of existing strategies.

Contribution

The paper introduces a $ ext{delta}$-correct sequential sampling algorithm with asymptotic optimality and a modified D-Tracking strategy that enhances empirical performance and reduces computational complexity.

Findings

01

Asymptotically optimal sample complexity achieved.

02

Significant empirical improvements over previous methods.

03

Reduced per-round computational cost from linear to logarithmic.

Abstract

We introduce the Thresholding Monte Carlo Tree Search problem, in which, given a tree $T$ and a threshold $θ$ , a player must answer whether the root node value of $T$ is at least $θ$ or not. In the given tree, `MAX' or `MIN' is labeled on each internal node, and the value of a `MAX'-labeled (`MIN'-labeled) internal node is the maximum (minimum) of its child values. The value of a leaf node is the mean reward of an unknown distribution, from which the player can sample rewards. For this problem, we develop a $δ$ -correct sequential sampling algorithm based on the Track-and-Stop strategy that has asymptotically optimal sample complexity. We show that a ratio-based modification of the D-Tracking arm-pulling strategy leads to a substantial improvement in empirical sample complexity, as well as reducing the per-round computational cost from linear to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games · Optimization and Search Problems · Advanced Bandit Algorithms Research