$\varepsilon$-Good Action Identification in Fixed-Budget Monte Carlo Tree Search
Yinan Li, Tuan Nguyen, Kwang-Sung Jun

TL;DR
This paper introduces an $\varepsilon$-agnostic algorithm for fixed-budget max-min action identification in trees, providing instance-dependent error bounds and new theoretical guarantees for Monte Carlo Tree Search.
Contribution
It presents the first provable fixed-budget algorithm with guarantees for max-min action identification, independent of $\varepsilon$, with bounds based on instance-specific gaps.
Findings
Misidentification probability decays exponentially with sample size.
The algorithm recovers known guarantees for special cases like best-arm identification.
Provides new $\varepsilon$-good guarantees for Successive Rejects.
Abstract
We study the fixed-budget max-min action identification problem in depth-2 max-min trees, an important special case of Monte Carlo Tree Search. A learner sequentially allocates samples to leaves and then recommends a subtree whose minimum leaf value is largest. Motivated by approximate planning, we focus on -good subtree identification, where any subtree whose min value is within of the optimal maximin value is acceptable. Our main contribution is an -agnostic algorithm: it does not require as input, but achieves instance-dependent error bounds for every meaningful . We show that the misidentification probability decays as , where captures both cross-subtree and within-subtree gaps. When each subtree has a single leaf, the problem reduces to standard…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
