Adaptive Prior Selection in Gaussian Process Bandits with Thompson Sampling
Jack Sandberg, Morteza Haghir Chehreghani

TL;DR
This paper introduces two algorithms for adaptive prior selection in Gaussian process bandits using Thompson sampling, providing theoretical regret bounds and demonstrating improved performance over existing methods.
Contribution
The paper proposes novel algorithms for joint prior selection and regret minimization in GP bandits, with theoretical analysis and empirical validation.
Findings
Both algorithms achieve sublinear regret bounds.
The methods outperform standard hyperparameter tuning approaches.
Algorithms are effective on synthetic and real-world data.
Abstract
Gaussian process (GP) bandits provide a powerful framework for performing blackbox optimization of unknown functions. The characteristics of the unknown function depend heavily on the assumed GP prior. Most work in the literature assume that this prior is known but in practice this seldom holds. Instead, practitioners often rely on maximum likelihood estimation to select the hyperparameters of the prior - which lacks theoretical guarantees. In this work, we propose two algorithms for joint prior selection and regret minimization in GP bandits based on GP Thompson sampling (GP-TS): Prior-Elimination GP-TS (PE-GP-TS) that disqualifies priors with poor predictive performance, and HyperPrior GP-TS (HP-GP-TS) that utilizes a bi-level Thompson sampling scheme. We theoretically analyze the algorithms and establish upper bounds for their respective regret. In addition, we demonstrate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference · Distributed Sensor Networks and Detection Algorithms
