Sparsity, variance and curvature in multi-armed bandits

S\'ebastien Bubeck; Michael B. Cohen; Yuanzhi Li

arXiv:1711.01037·cs.LG·November 6, 2017·56 cites

Sparsity, variance and curvature in multi-armed bandits

S\'ebastien Bubeck, Michael B. Cohen, Yuanzhi Li

PDF

Open Access

TL;DR

This paper advances understanding of sparsity, variance, and curvature in adversarial multi-armed and linear bandits, providing new algorithms with improved regret bounds under these conditions.

Contribution

It solves several open problems by establishing regret bounds for sparse losses, bounded variation sequences, and curved action sets in bandit settings.

Findings

01

Achieved $ ilde{O}( ext{sqrt}(s T))$ regret for $s$-sparse losses

02

Achieved $ ilde{O}( ext{sqrt}(Q))$ regret for loss sequences with bounded variation

03

Established regret bounds for linear bandits on $ ext{ell}_p^n$ balls for $p ext{ in } [1,2]$

Abstract

In (online) learning theory the concepts of sparsity, variance and curvature are well-understood and are routinely used to obtain refined regret and generalization bounds. In this paper we further our understanding of these concepts in the more challenging limited feedback scenario. We consider the adversarial multi-armed bandit and linear bandit settings and solve several open problems pertaining to the existence of algorithms with favorable regret bounds under the following assumptions: (i) sparsity of the individual losses, (ii) small variation of the loss sequence, and (iii) curvature of the action set. Specifically we show that (i) for $s$ -sparse losses one can obtain $\tilde{O} (s T)$ -regret (solving an open problem by Kwon and Perchet), (ii) for loss sequences with variation bounded by $Q$ one can obtain $\tilde{O} (Q)$ -regret (solving an open problem by Kale and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · COVID-19 epidemiological studies