Kullback-Leibler upper confidence bounds for optimal sequential   allocation

Olivier Capp\'e; Aur\'elien Garivier; Odalric-Ambrym Maillard; R\'emi; Munos; Gilles Stoltz

arXiv:1210.1136·math.PR·August 27, 2013

Kullback-Leibler upper confidence bounds for optimal sequential allocation

Olivier Capp\'e, Aur\'elien Garivier, Odalric-Ambrym Maillard, R\'emi, Munos, Gilles Stoltz

PDF

TL;DR

This paper introduces KL-UCB algorithms for optimal sequential decision-making in multi-armed bandit problems, providing finite-time regret bounds that match theoretical lower bounds and outperform existing methods.

Contribution

It presents a unified analysis of KL-UCB algorithms for different distribution classes, establishing their asymptotic optimality and practical improvements.

Findings

01

Finite-time regret bounds match theoretical lower bounds.

02

Algorithms outperform existing methods on bounded reward distributions.

03

Unified analysis applies to multiple distribution classes.

Abstract

We consider optimal sequential allocation in the context of the so-called stochastic multi-armed bandit model. We describe a generic index policy, in the sense of Gittins [J. R. Stat. Soc. Ser. B Stat. Methodol. 41 (1979) 148-177], based on upper confidence bounds of the arm payoffs computed using the Kullback-Leibler divergence. We consider two classes of distributions for which instances of this general idea are analyzed: the kl-UCB algorithm is designed for one-parameter exponential families and the empirical KL-UCB algorithm for bounded and finitely supported distributions. Our main contribution is a unified finite-time analysis of the regret of these algorithms that asymptotically matches the lower bounds of Lai and Robbins [Adv. in Appl. Math. 6 (1985) 4-22] and Burnetas and Katehakis [Adv. in Appl. Math. 17 (1996) 122-142], respectively. We also investigate the behavior of these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.