Bounded regret in stochastic multi-armed bandits

S\'ebastien Bubeck; Vianney Perchet; Philippe Rigollet

arXiv:1302.1611·math.ST·February 13, 2013·21 cites

Bounded regret in stochastic multi-armed bandits

S\'ebastien Bubeck, Vianney Perchet, Philippe Rigollet

PDF

Open Access

TL;DR

This paper introduces a new randomized policy for stochastic multi-armed bandits that achieves bounded regret over time when the optimal arm's value and a positive gap lower bound are known, with proofs of lower bounds clarifying the limits of bounded regret.

Contribution

The paper presents a novel randomized policy that guarantees bounded regret in the bandit setting under specific knowledge assumptions, advancing understanding of regret bounds.

Findings

01

The proposed policy achieves bounded regret when the optimal arm's value and a positive gap lower bound are known.

02

Lower bounds demonstrate that bounded regret is impossible if only the gap or only the optimal value is known.

03

Bounded regret of order 1/Δ cannot be achieved if only the optimal value is known.

Abstract

We study the stochastic multi-armed bandit problem when one knows the value $μ^{(⋆)}$ of an optimal arm, as a well as a positive lower bound on the smallest positive gap $Δ$ . We propose a new randomized policy that attains a regret {\em uniformly bounded over time} in this setting. We also prove several lower bounds, which show in particular that bounded regret is not possible if one only knows $Δ$ , and bounded regret of order $1/Δ$ is not possible if one only knows $μ^{(⋆)}$

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Cognitive Radio Networks and Spectrum Sensing · Smart Grid Energy Management