Thompson Sampling: An Asymptotically Optimal Finite Time Analysis

Emilie Kaufmann; Nathaniel Korda; R\'emi Munos

arXiv:1205.4217·stat.ML·July 20, 2012·34 cites

Thompson Sampling: An Asymptotically Optimal Finite Time Analysis

Emilie Kaufmann, Nathaniel Korda, R\'emi Munos

PDF

Open Access 1 Repo

TL;DR

This paper proves that Thompson Sampling is asymptotically optimal for Bernoulli bandit problems by providing the first finite-time analysis matching the Lai and Robbins lower bound, supported by numerical comparisons.

Contribution

It offers the first finite-time analysis confirming Thompson Sampling's asymptotic optimality for Bernoulli rewards, filling a long-standing open problem.

Findings

01

Thompson Sampling achieves the Lai and Robbins lower bound asymptotically.

02

Numerical experiments compare Thompson Sampling with other optimal policies.

03

The analysis confirms the theoretical optimality of Thompson Sampling for Bernoulli bandits.

Abstract

The question of the optimality of Thompson Sampling for solving the stochastic multi-armed bandit problem had been open since 1933. In this paper we answer it positively for the case of Bernoulli rewards by providing the first finite-time analysis that matches the asymptotic rate given in the Lai and Robbins lower bound for the cumulative regret. The proof is accompanied by a numerical comparison with other optimal policies, experiments that have been lacking in the literature until now for the Bernoulli case.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Ralami1859/Stochastic-Multi-Armed-Bandit
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management