A Finite-Time Analysis of Multi-armed Bandits Problems with   Kullback-Leibler Divergences

Odalric-Ambrym Maillard (INRIA Lille - Nord Europe); R\'emi Munos; (INRIA Lille - Nord Europe); Gilles Stoltz (DMA; GREGH; INRIA Paris -; Rocquencourt)

arXiv:1105.5820·math.ST·June 1, 2011·COLT·121 cites

A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences

Odalric-Ambrym Maillard (INRIA Lille - Nord Europe), R\'emi Munos, (INRIA Lille - Nord Europe), Gilles Stoltz (DMA, GREGH, INRIA Paris -, Rocquencourt)

PDF

Open Access

TL;DR

This paper provides a finite-time analysis of a Kullback-Leibler-based algorithm for stochastic multi-armed bandits with finite support distributions, achieving regret bounds better than previous algorithms.

Contribution

It offers the first finite-time bounds for a KL-based bandit algorithm with finite support distributions, matching asymptotic lower bounds.

Findings

01

Finite-time regret bounds are tighter than previous UCB-type algorithms.

02

The algorithm's asymptotic regret matches the known lower bounds.

03

Results demonstrate improved performance in finite-time scenarios.

Abstract

We consider a Kullback-Leibler-based algorithm for the stochastic multi-armed bandit problem in the case of distributions with finite supports (not necessarily known beforehand), whose asymptotic regret matches the lower bound of \cite{Burnetas96}. Our contribution is to provide a finite-time analysis of this algorithm; we get bounds whose main terms are smaller than the ones of previously known algorithms with finite-time analyses (like UCB-type algorithms).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Reinforcement Learning in Robotics