Context-lumpable stochastic bandits
Chung-Wei Lee, Qinghua Liu, Yasin Abbasi-Yadkori, Chi Jin, Tor, Lattimore, Csaba Szepesv\'ari

TL;DR
This paper studies a specialized contextual bandit problem where contexts can be grouped into a small number of clusters, providing algorithms with near-optimal sample complexity and regret bounds, advancing understanding of low-rank bandit models.
Contribution
It introduces algorithms for context-lumpable bandits with near-optimal sample complexity and regret bounds, and extends to more general low-rank bandit scenarios.
Findings
Achieves near-optimal sample complexity in PAC setting.
Provides minimax regret bounds in online setting.
Extends methods to low-rank bandits with improved regret.
Abstract
We consider a contextual bandit problem with contexts and actions. In each round , the learner observes a random context and chooses an action based on its past experience. The learner then observes a random reward whose mean is a function of the context and the action for the round. Under the assumption that the contexts can be lumped into groups such that the mean reward for the various actions is the same for any two contexts that are in the same group, we give an algorithm that outputs an -optimal policy after using at most samples with high probability and provide a matching lower bound. In the regret minimization setting, we give an algorithm whose cumulative regret up to time is bounded by . To the best of our knowledge, we are the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Stochastic Gradient Optimization Techniques
