Improved Regret Analysis in Gaussian Process Bandits: Optimality for   Noiseless Reward, RKHS norm, and Non-Stationary Variance

Shogo Iwazaki; Shion Takeno

arXiv:2502.06363·cs.LG·February 11, 2025

Improved Regret Analysis in Gaussian Process Bandits: Optimality for Noiseless Reward, RKHS norm, and Non-Stationary Variance

Shogo Iwazaki, Shion Takeno

PDF

Open Access 1 Video

TL;DR

This paper advances the analysis of Gaussian process bandit algorithms by providing tighter regret bounds in noiseless, RKHS-norm constrained, and non-stationary noise settings, achieving near-optimal performance.

Contribution

It introduces a new upper bound on the maximum posterior variance, enabling refined algorithms with optimal regret bounds in various challenging settings.

Findings

01

Achieves nearly optimal regret bounds in noiseless scenarios.

02

Provides regret bounds that depend optimally on the RKHS norm.

03

Extends analysis to non-stationary noise variance, matching lower bounds.

Abstract

We study the Gaussian process (GP) bandit problem, whose goal is to minimize regret under an unknown reward function lying in some reproducing kernel Hilbert space (RKHS). The maximum posterior variance analysis is vital in analyzing near-optimal GP bandit algorithms such as maximum variance reduction (MVR) and phased elimination (PE). Therefore, we first show the new upper bound of the maximum posterior variance, which improves the dependence of the noise variance parameters of the GP. By leveraging this result, we refine the MVR and PE to obtain (i) a nearly optimal regret upper bound in the noiseless setting and (ii) regret upper bounds that are optimal with respect to the RKHS norm of the reward function. Furthermore, as another application of our proposed bound, we analyze the GP bandit under the time-varying noise variance setting, which is the kernelized extension of the linear…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Improved Regret Analysis in Gaussian Process Bandits: Optimality for Noiseless Reward, RKHS norm, and Non-Stationary Variance· slideslive

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Advanced Bandit Algorithms Research · Air Quality Monitoring and Forecasting