A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences
Odalric-Ambrym Maillard (INRIA Lille - Nord Europe), R\'emi Munos, (INRIA Lille - Nord Europe), Gilles Stoltz (DMA, GREGH, INRIA Paris -, Rocquencourt)

TL;DR
This paper provides a finite-time analysis of a Kullback-Leibler-based algorithm for stochastic multi-armed bandits with finite support distributions, achieving regret bounds better than previous algorithms.
Contribution
It offers the first finite-time bounds for a KL-based bandit algorithm with finite support distributions, matching asymptotic lower bounds.
Findings
Finite-time regret bounds are tighter than previous UCB-type algorithms.
The algorithm's asymptotic regret matches the known lower bounds.
Results demonstrate improved performance in finite-time scenarios.
Abstract
We consider a Kullback-Leibler-based algorithm for the stochastic multi-armed bandit problem in the case of distributions with finite supports (not necessarily known beforehand), whose asymptotic regret matches the lower bound of \cite{Burnetas96}. Our contribution is to provide a finite-time analysis of this algorithm; we get bounds whose main terms are smaller than the ones of previously known algorithms with finite-time analyses (like UCB-type algorithms).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Reinforcement Learning in Robotics
