Explore First, Exploit Next: The True Shape of Regret in Bandit Problems
Aur\'elien Garivier (IMT), Pierre M\'enard (IMT), Gilles Stoltz, (GREGH)

TL;DR
This paper revisits regret lower bounds in multi-armed bandit problems, providing non-asymptotic, distribution-dependent bounds that clarify the phases of regret growth and simplify proof techniques.
Contribution
It introduces straightforward, information-theoretic proofs for non-asymptotic regret bounds, highlighting the initial linear growth and the final logarithmic phase.
Findings
Regret grows almost linearly initially
Logarithmic regret growth occurs only in the final phase
Proof techniques are simplified and more direct
Abstract
We revisit lower bounds on the regret in the case of multi-armed bandit problems. We obtain non-asymptotic, distribution-dependent bounds and provide straightforward proofs based only on well-known properties of Kullback-Leibler divergences. These bounds show in particular that in an initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they are deprived of all unnecessary complications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
