Active Context Selection Improves Simple Regret in Contextual Bandits
Mohammad Shahverdikondori, Jalal Etesami, Negar Kiyavash

TL;DR
This paper investigates how active context sampling in contextual bandits can significantly reduce simple regret, providing tight theoretical guarantees and practical algorithms for both known and unknown context distributions.
Contribution
It introduces a framework for active context selection in contextual bandits, deriving tight regret bounds and proposing the EETC algorithm for unknown distributions.
Findings
Active sampling improves regret rates by up to Θ(k^{1/4}) over passive sampling.
Theoretical regret bounds are established for known and unknown context distributions.
Experiments validate the theoretical improvements on synthetic and real data.
Abstract
We study the contextual multi-armed bandit problem with a finite context space (a.k.a. subpopulations), where the learner recommends a best action for each context and is evaluated by context-weighted simple regret. Our guarantees are worst-case over the reward distributions, while remaining instance-dependent with respect to the context distribution vector . Akin to experimental design problems where the population of interest is fixed but the sampled subpopulation can be controlled, we allow the learner to actively choose which context to sample from. For a known , we characterize tight regret rates: passive sampling where contexts are randomly revealed achieves regret of order , whereas active sampling with allocation achieves the tight rate . The resulting improvement can be as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
