Bandit Algorithms for Tree Search

Pierre-Arnuad Coquelin; Remi Munos

arXiv:1408.2028·cs.AI·August 12, 2014·179 cites

Bandit Algorithms for Tree Search

Pierre-Arnuad Coquelin, Remi Munos

PDF

Open Access

TL;DR

This paper explores bandit algorithms for tree search, analyzing their theoretical properties and proposing new methods like BAST to improve exploration efficiency, especially in large or infinite trees, with applications to global optimization.

Contribution

It introduces and analyzes new bandit algorithms for tree search, including BAST, which adapt to smoothness and size of the tree, with proven regret bounds and incremental expansion strategies.

Findings

01

UCT can be over-optimistic with poor worst-case regret

02

Exponential confidence bounds improve tree search performance

03

BAST effectively exploits smoothness for efficient pruning

Abstract

Bandit based methods for tree search have recently gained popularity when applied to huge trees, e.g. in the game of go [6]. Their efficient exploration of the tree enables to re- turn rapidly a good value, and improve preci- sion if more time is provided. The UCT algo- rithm [8], a tree search method based on Up- per Confidence Bounds (UCB) [2], is believed to adapt locally to the effective smoothness of the tree. However, we show that UCT is "over-optimistic" in some sense, leading to a worst-case regret that may be very poor. We propose alternative bandit algorithms for tree search. First, a modification of UCT us- ing a confidence sequence that scales expo- nentially in the horizon depth is analyzed. We then consider Flat-UCB performed on the leaves and provide a finite regret bound with high probability. Then, we introduce and analyze a Bandit Algorithm for Smooth Trees (BAST)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games · Advanced Bandit Algorithms Research · Reinforcement Learning in Robotics