Time-Varying Gaussian Process Bandit Optimization
Ilija Bogunovic, Jonathan Scarlett, Volkan Cevher

TL;DR
This paper introduces two extensions of Gaussian process bandit algorithms for time-varying reward functions, providing regret bounds and demonstrating improved performance over classical methods on synthetic and real data.
Contribution
The paper proposes R-GP-UCB and TV-GP-UCB algorithms with theoretical regret bounds for non-stationary environments, advancing Bayesian optimization techniques.
Findings
TV-GP-UCB outperforms R-GP-UCB in practice.
Both algorithms outperform classical GP-UCB.
Regret bounds explicitly relate to function variation rate.
Abstract
We consider the sequential Bayesian optimization problem with bandit feedback, adopting a formulation that allows for the reward function to vary with time. We model the reward function using a Gaussian process whose evolution obeys a simple Markov model. We introduce two natural extensions of the classical Gaussian process upper confidence bound (GP-UCB) algorithm. The first, R-GP-UCB, resets GP-UCB at regular intervals. The second, TV-GP-UCB, instead forgets about old data in a smooth fashion. Our main contribution comprises of novel regret bounds for these algorithms, providing an explicit characterization of the trade-off between the time horizon and the rate at which the function varies. We illustrate the performance of the algorithms on both synthetic and real data, and we find the gradual forgetting of TV-GP-UCB to perform favorably compared to the sharp resetting of R-GP-UCB.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference · Advanced Multi-Objective Optimization Algorithms
MethodsGaussian Process
