Efficient kernelized bandit algorithms via exploration distributions
Bingshan Hu, Zheng He, Danica J. Sutherland

TL;DR
This paper introduces a flexible class of efficient kernelized bandit algorithms using exploration distributions, achieving optimal regret bounds and potentially better practical performance through randomization.
Contribution
It proposes GP-Generic, a new framework for kernelized bandit algorithms based on exploration distributions, unifying and extending existing approaches with improved practical outcomes.
Findings
Achieves $ ilde{O}( ext{complexity} imes\sqrt{T})$ regret bounds.
Includes UCB and Thompson Sampling as special cases.
Randomized algorithms can outperform deterministic ones in practice.
Abstract
We consider a kernelized bandit problem with a compact arm set and a fixed but unknown reward function with a finite norm in some Reproducing Kernel Hilbert Space (RKHS). We propose a class of computationally efficient kernelized bandit algorithms, which we call GP-Generic, based on a novel concept: exploration distributions. This class of algorithms includes Upper Confidence Bound-based approaches as a special case, but also allows for a variety of randomized algorithms. With careful choice of exploration distribution, our proposed generic algorithm realizes a wide range of concrete algorithms that achieve regret bounds, where characterizes the RKHS complexity. This matches known results for UCB- and Thompson Sampling-based algorithms; we also show that in practice, randomization can yield better practical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Complexity and Algorithms in Graphs
MethodsSparse Evolutionary Training
