Bandit Algorithms for Tree Search
Pierre-Arnuad Coquelin, Remi Munos

TL;DR
This paper explores bandit algorithms for tree search, analyzing their theoretical properties and proposing new methods like BAST to improve exploration efficiency, especially in large or infinite trees, with applications to global optimization.
Contribution
It introduces and analyzes new bandit algorithms for tree search, including BAST, which adapt to smoothness and size of the tree, with proven regret bounds and incremental expansion strategies.
Findings
UCT can be over-optimistic with poor worst-case regret
Exponential confidence bounds improve tree search performance
BAST effectively exploits smoothness for efficient pruning
Abstract
Bandit based methods for tree search have recently gained popularity when applied to huge trees, e.g. in the game of go [6]. Their efficient exploration of the tree enables to re- turn rapidly a good value, and improve preci- sion if more time is provided. The UCT algo- rithm [8], a tree search method based on Up- per Confidence Bounds (UCB) [2], is believed to adapt locally to the effective smoothness of the tree. However, we show that UCT is "over-optimistic" in some sense, leading to a worst-case regret that may be very poor. We propose alternative bandit algorithms for tree search. First, a modification of UCT us- ing a confidence sequence that scales expo- nentially in the horizon depth is analyzed. We then consider Flat-UCB performed on the leaves and provide a finite regret bound with high probability. Then, we introduce and analyze a Bandit Algorithm for Smooth Trees (BAST)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games · Advanced Bandit Algorithms Research · Reinforcement Learning in Robotics
