Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

Aur\'elien Garivier (IMT); Pierre M\'enard (IMT); Gilles Stoltz; (GREGH)

arXiv:1602.07182·math.ST·October 16, 2018

Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

Aur\'elien Garivier (IMT), Pierre M\'enard (IMT), Gilles Stoltz, (GREGH)

PDF

TL;DR

This paper revisits regret lower bounds in multi-armed bandit problems, providing non-asymptotic, distribution-dependent bounds that clarify the phases of regret growth and simplify proof techniques.

Contribution

It introduces straightforward, information-theoretic proofs for non-asymptotic regret bounds, highlighting the initial linear growth and the final logarithmic phase.

Findings

01

Regret grows almost linearly initially

02

Logarithmic regret growth occurs only in the final phase

03

Proof techniques are simplified and more direct

Abstract

We revisit lower bounds on the regret in the case of multi-armed bandit problems. We obtain non-asymptotic, distribution-dependent bounds and provide straightforward proofs based only on well-known properties of Kullback-Leibler divergences. These bounds show in particular that in an initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they are deprived of all unnecessary complications.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.