Sparse Stochastic Bandits

Joon Kwon; Vianney Perchet; Claire Vernade

arXiv:1706.01383·cs.LG·June 6, 2017·2 cites

Sparse Stochastic Bandits

Joon Kwon, Vianney Perchet, Claire Vernade

PDF

Open Access 1 Video

TL;DR

This paper addresses the sparse multi-armed bandit problem, proposing an algorithm that leverages sparsity to achieve regret scaling with the number of positive arms, s, rather than total arms, d, and proves its optimality.

Contribution

It introduces a new algorithm for sparse bandits with regret scaling with s, and establishes its optimality through matching lower bounds and simulations.

Findings

01

Regret scales with s instead of d in sparse bandits.

02

The proposed algorithm is proven to be optimal within certain parameter ranges.

03

Performance validated on simulated data.

Abstract

In the classical multi-armed bandit problem, d arms are available to the decision maker who pulls them sequentially in order to maximize his cumulative reward. Guarantees can be obtained on a relative quantity called regret, which scales linearly with d (or with sqrt(d) in the minimax sense). We here consider the sparse case of this classical problem in the sense that only a small number of arms, namely s < d, have a positive expected reward. We are able to leverage this additional assumption to provide an algorithm whose regret scales with s instead of d. Moreover, we prove that this algorithm is optimal by providing a matching lower bound - at least for a wide and pertinent range of parameters that we determine - and by evaluating its performance on simulated data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Sparse Stochastic Bandits· youtube

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems