Sparse Stochastic Bandits
Joon Kwon, Vianney Perchet, Claire Vernade

TL;DR
This paper addresses the sparse multi-armed bandit problem, proposing an algorithm that leverages sparsity to achieve regret scaling with the number of positive arms, s, rather than total arms, d, and proves its optimality.
Contribution
It introduces a new algorithm for sparse bandits with regret scaling with s, and establishes its optimality through matching lower bounds and simulations.
Findings
Regret scales with s instead of d in sparse bandits.
The proposed algorithm is proven to be optimal within certain parameter ranges.
Performance validated on simulated data.
Abstract
In the classical multi-armed bandit problem, d arms are available to the decision maker who pulls them sequentially in order to maximize his cumulative reward. Guarantees can be obtained on a relative quantity called regret, which scales linearly with d (or with sqrt(d) in the minimax sense). We here consider the sparse case of this classical problem in the sense that only a small number of arms, namely s < d, have a positive expected reward. We are able to leverage this additional assumption to provide an algorithm whose regret scales with s instead of d. Moreover, we prove that this algorithm is optimal by providing a matching lower bound - at least for a wide and pertinent range of parameters that we determine - and by evaluating its performance on simulated data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Sparse Stochastic Bandits· youtube
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems
