A minimax and asymptotically optimal algorithm for stochastic bandits

Pierre M\'enard (1); Aur\'elien Garivier (1) ((1) IMT)

arXiv:1702.07211·stat.ML·September 21, 2017·72 cites

A minimax and asymptotically optimal algorithm for stochastic bandits

Pierre M\'enard (1), Aur\'elien Garivier (1) ((1) IMT)

PDF

Open Access

TL;DR

The paper introduces the kl-UCB++ algorithm for stochastic bandits, achieving both asymptotic and minimax optimality, merging two key research directions with clear proofs.

Contribution

It presents the first algorithm that is both asymptotically and minimax optimal for stochastic bandits with exponential families.

Findings

01

Proves kl-UCB++ is asymptotically optimal.

02

Shows kl-UCB++ is minimax optimal.

03

Provides simple, clear proofs for both properties.

Abstract

We propose the kl-UCB ++ algorithm for regret minimization in stochastic bandit models with exponential families of distributions. We prove that it is simultaneously asymptotically optimal (in the sense of Lai and Robbins' lower bound) and minimax optimal. This is the first algorithm proved to enjoy these two properties at the same time. This work thus merges two different lines of research with simple and clear proofs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Risk and Portfolio Optimization · Decision-Making and Behavioral Economics