Gaussian Process Bandits for Tree Search: Theory and Application to Planning in Discounted MDPs
Louis Dorard, John Shawe-Taylor

TL;DR
This paper introduces GPTS, a Gaussian Process-based Tree Search algorithm for planning in discounted MDPs, providing theoretical regret bounds and demonstrating practical effectiveness in Open Loop Planning tasks.
Contribution
The paper develops GPTS, a novel tree search algorithm leveraging Gaussian Processes, with efficient implementation and regret analysis, applied to planning in discounted MDPs.
Findings
Regret grows as square root of iterations, with improvements for larger Gaussian kernel widths.
GPTS achieves similar regret bounds to existing algorithms like OLOP.
Practical application to Open Loop Planning shows promising results.
Abstract
We motivate and analyse a new Tree Search algorithm, GPTS, based on recent theoretical advances in the use of Gaussian Processes for Bandit problems. We consider tree paths as arms and we assume the target/reward function is drawn from a GP distribution. The posterior mean and variance, after observing data, are used to define confidence intervals for the function values, and we sequentially play arms with highest upper confidence bounds. We give an efficient implementation of GPTS and we adapt previous regret bounds by determining the decay rate of the eigenvalues of the kernel matrix on the whole set of tree paths. We consider two kernels in the feature space of binary vectors indexed by the nodes of the tree: linear and Gaussian. The regret grows in square root of the number of iterations T, up to a logarithmic factor, with a constant that improves with bigger Gaussian kernel widths.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference · Machine Learning and Algorithms
