Context-lumpable stochastic bandits

Chung-Wei Lee; Qinghua Liu; Yasin Abbasi-Yadkori; Chi Jin; Tor; Lattimore; Csaba Szepesv\'ari

arXiv:2306.13053·cs.LG·November 29, 2023

Context-lumpable stochastic bandits

Chung-Wei Lee, Qinghua Liu, Yasin Abbasi-Yadkori, Chi Jin, Tor, Lattimore, Csaba Szepesv\'ari

PDF

Open Access

TL;DR

This paper studies a specialized contextual bandit problem where contexts can be grouped into a small number of clusters, providing algorithms with near-optimal sample complexity and regret bounds, advancing understanding of low-rank bandit models.

Contribution

It introduces algorithms for context-lumpable bandits with near-optimal sample complexity and regret bounds, and extends to more general low-rank bandit scenarios.

Findings

01

Achieves near-optimal sample complexity in PAC setting.

02

Provides minimax regret bounds in online setting.

03

Extends methods to low-rank bandits with improved regret.

Abstract

We consider a contextual bandit problem with $S$ contexts and $K$ actions. In each round $t = 1, 2, \dots$ , the learner observes a random context and chooses an action based on its past experience. The learner then observes a random reward whose mean is a function of the context and the action for the round. Under the assumption that the contexts can be lumped into $r \leq min {S, K}$ groups such that the mean reward for the various actions is the same for any two contexts that are in the same group, we give an algorithm that outputs an $ϵ$ -optimal policy after using at most $O (r (S + K) / ϵ^{2})$ samples with high probability and provide a matching $Ω (r (S + K) / ϵ^{2})$ lower bound. In the regret minimization setting, we give an algorithm whose cumulative regret up to time $T$ is bounded by $O (r^{3} (S + K) T)$ . To the best of our knowledge, we are the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Stochastic Gradient Optimization Techniques