Sparsity, variance and curvature in multi-armed bandits
S\'ebastien Bubeck, Michael B. Cohen, Yuanzhi Li

TL;DR
This paper advances understanding of sparsity, variance, and curvature in adversarial multi-armed and linear bandits, providing new algorithms with improved regret bounds under these conditions.
Contribution
It solves several open problems by establishing regret bounds for sparse losses, bounded variation sequences, and curved action sets in bandit settings.
Findings
Achieved $ ilde{O}( ext{sqrt}(s T))$ regret for $s$-sparse losses
Achieved $ ilde{O}( ext{sqrt}(Q))$ regret for loss sequences with bounded variation
Established regret bounds for linear bandits on $ ext{ell}_p^n$ balls for $p ext{ in } [1,2]$
Abstract
In (online) learning theory the concepts of sparsity, variance and curvature are well-understood and are routinely used to obtain refined regret and generalization bounds. In this paper we further our understanding of these concepts in the more challenging limited feedback scenario. We consider the adversarial multi-armed bandit and linear bandit settings and solve several open problems pertaining to the existence of algorithms with favorable regret bounds under the following assumptions: (i) sparsity of the individual losses, (ii) small variation of the loss sequence, and (iii) curvature of the action set. Specifically we show that (i) for -sparse losses one can obtain -regret (solving an open problem by Kwon and Perchet), (ii) for loss sequences with variation bounded by one can obtain -regret (solving an open problem by Kale and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · COVID-19 epidemiological studies
