Adversarial Contextual Bandits Go Kernelized
Gergely Neu, Julia Olkhovskaya, Sattar Vakili

TL;DR
This paper extends adversarial linear contextual bandits to kernelized loss functions, proposing an efficient algorithm with near-optimal regret bounds that adapt to different eigenvalue decay rates of the kernel.
Contribution
It introduces a new kernelized adversarial bandit algorithm with a novel loss estimator, achieving near-optimal regret under various eigenvalue decay assumptions.
Findings
Regret bound of O(KT^{(1/2)(1+1/c)}) for polynomial eigendecay
Regret bound of O(\u221a{T}) for exponential eigendecay
Matches known lower bounds and improves upon previous bounds in kernelized adversarial bandits
Abstract
We study a generalization of the problem of online learning in adversarial linear contextual bandits by incorporating loss functions that belong to a reproducing kernel Hilbert space, which allows for a more flexible modeling of complex decision-making scenarios. We propose a computationally efficient algorithm that makes use of a new optimistically biased estimator for the loss functions and achieves near-optimal regret guarantees under a variety of eigenvalue decay assumptions made on the underlying kernel. Specifically, under the assumption of polynomial eigendecay with exponent , the regret is , where denotes the number of rounds and the number of actions. Furthermore, when the eigendecay follows an exponential pattern, we achieve an even tighter regret bound of . These rates match the lower bounds…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Adversarial Robustness in Machine Learning
