Contextual Combinatorial Bandits with Changing Action Sets via Gaussian Processes
Andi Nika, Sepehr Elahi, Cem Tekin

TL;DR
This paper introduces a Gaussian Process-based algorithm for combinatorial contextual bandits with changing action sets, achieving low regret and outperforming existing methods through theoretical guarantees and experiments.
Contribution
The paper proposes the O'CLOK-UCB algorithm and its sparse GP variant for combinatorial bandits with dynamic action sets, providing regret bounds and empirical improvements.
Findings
O'CLOK-UCB achieves near-optimal regret bounds.
Sparse GP variant significantly speeds up computation.
Algorithms outperform previous state-of-the-art in experiments.
Abstract
We consider a contextual bandit problem with a combinatorial action set and time-varying base arm availability. At the beginning of each round, the agent observes the set of available base arms and their contexts and then selects an action that is a feasible subset of the set of available base arms to maximize its cumulative reward in the long run. We assume that the mean outcomes of base arms are samples from a Gaussian Process (GP) indexed by the context set , and the expected reward is Lipschitz continuous in expected base arm outcomes. For this setup, we propose an algorithm called Optimistic Combinatorial Learning and Optimization with Kernel Upper Confidence Bounds (O'CLOK-UCB) and prove that it incurs regret with high probability, where is the maximum…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Machine Learning and Data Classification
MethodsGaussian Process
