A minimax and asymptotically optimal algorithm for stochastic bandits
Pierre M\'enard (1), Aur\'elien Garivier (1) ((1) IMT)

TL;DR
The paper introduces the kl-UCB++ algorithm for stochastic bandits, achieving both asymptotic and minimax optimality, merging two key research directions with clear proofs.
Contribution
It presents the first algorithm that is both asymptotically and minimax optimal for stochastic bandits with exponential families.
Findings
Proves kl-UCB++ is asymptotically optimal.
Shows kl-UCB++ is minimax optimal.
Provides simple, clear proofs for both properties.
Abstract
We propose the kl-UCB ++ algorithm for regret minimization in stochastic bandit models with exponential families of distributions. We prove that it is simultaneously asymptotically optimal (in the sense of Lai and Robbins' lower bound) and minimax optimal. This is the first algorithm proved to enjoy these two properties at the same time. This work thus merges two different lines of research with simple and clear proofs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Risk and Portfolio Optimization · Decision-Making and Behavioral Economics
