Bounded regret in stochastic multi-armed bandits
S\'ebastien Bubeck, Vianney Perchet, Philippe Rigollet

TL;DR
This paper introduces a new randomized policy for stochastic multi-armed bandits that achieves bounded regret over time when the optimal arm's value and a positive gap lower bound are known, with proofs of lower bounds clarifying the limits of bounded regret.
Contribution
The paper presents a novel randomized policy that guarantees bounded regret in the bandit setting under specific knowledge assumptions, advancing understanding of regret bounds.
Findings
The proposed policy achieves bounded regret when the optimal arm's value and a positive gap lower bound are known.
Lower bounds demonstrate that bounded regret is impossible if only the gap or only the optimal value is known.
Bounded regret of order 1/Δ cannot be achieved if only the optimal value is known.
Abstract
We study the stochastic multi-armed bandit problem when one knows the value of an optimal arm, as a well as a positive lower bound on the smallest positive gap . We propose a new randomized policy that attains a regret {\em uniformly bounded over time} in this setting. We also prove several lower bounds, which show in particular that bounded regret is not possible if one only knows , and bounded regret of order is not possible if one only knows
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Cognitive Radio Networks and Spectrum Sensing · Smart Grid Energy Management
