Efficient Kernel UCB for Contextual Bandits

Houssam Zenati; Alberto Bietti; Eustache Diemert; Julien Mairal,; Matthieu Martin; Pierre Gaillard

arXiv:2202.05638·cs.LG·February 14, 2022

Efficient Kernel UCB for Contextual Bandits

Houssam Zenati, Alberto Bietti, Eustache Diemert, Julien Mairal,, Matthieu Martin, Pierre Gaillard

PDF

Open Access 1 Repo

TL;DR

This paper introduces an efficient kernelized UCB algorithm for contextual bandits that significantly reduces computational complexity using Nystrom approximations, making large-scale applications feasible.

Contribution

The authors develop a scalable kernel UCB method with incremental Nystrom approximation, reducing complexity from cubic to linear in the horizon for large problems.

Findings

01

Achieves O(CTm^2) complexity with Nystrom approximation

02

Maintains regret bounds comparable to standard kernel UCB

03

Effective dimension bounds m to O(√T) in some cases

Abstract

In this paper, we tackle the computational efficiency of kernelized UCB algorithms in contextual bandits. While standard methods require a O(CT^3) complexity where T is the horizon and the constant C is related to optimizing the UCB rule, we propose an efficient contextual algorithm for large-scale problems. Specifically, our method relies on incremental Nystrom approximations of the joint kernel embedding of contexts and actions. This allows us to achieve a complexity of O(CTm^2) where m is the number of Nystrom points. To recover the same regret as the standard kernelized UCB algorithm, m needs to be of order of the effective dimension of the problem, which is at most O(\sqrt(T)) and nearly constant in some cases.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

criteo-research/efficient-kernel-ucb
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data