Lenient Regret and Good-Action Identification in Gaussian Process Bandits
Xu Cai, Selwyn Gomes, Jonathan Scarlett

TL;DR
This paper introduces lenient regret concepts for Gaussian process bandits, providing theoretical bounds and practical algorithms for identifying good actions efficiently under relaxed optimality criteria.
Contribution
It presents new lenient regret notions, upper and lower bounds, and algorithms for good-action identification leveraging threshold knowledge.
Findings
Upper bounds on lenient regret for GP-UCB and elimination algorithms
Lower bounds on regret independent of algorithms
Algorithms for faster good-action identification using threshold info
Abstract
In this paper, we study the problem of Gaussian process (GP) bandits under relaxed optimization criteria stating that any function value above a certain threshold is "good enough". On the theoretical side, we study various {\em lenient regret} notions in which all near-optimal actions incur zero penalty, and provide upper bounds on the lenient regret for GP-UCB and an elimination algorithm, circumventing the usual term (with time horizon ) resulting from zooming extremely close towards the function maximum. In addition, we complement these upper bounds with algorithm-independent lower bounds. On the practical side, we consider the problem of finding a single "good action" according to a known pre-specified threshold, and introduce several good-action identification algorithms that exploit knowledge of the threshold. We experimentally find that such algorithms can often…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference · Advanced Multi-Objective Optimization Algorithms
MethodsGaussian Process
