Thompson Sampling: An Asymptotically Optimal Finite Time Analysis
Emilie Kaufmann, Nathaniel Korda, R\'emi Munos

TL;DR
This paper proves that Thompson Sampling is asymptotically optimal for Bernoulli bandit problems by providing the first finite-time analysis matching the Lai and Robbins lower bound, supported by numerical comparisons.
Contribution
It offers the first finite-time analysis confirming Thompson Sampling's asymptotic optimality for Bernoulli rewards, filling a long-standing open problem.
Findings
Thompson Sampling achieves the Lai and Robbins lower bound asymptotically.
Numerical experiments compare Thompson Sampling with other optimal policies.
The analysis confirms the theoretical optimality of Thompson Sampling for Bernoulli bandits.
Abstract
The question of the optimality of Thompson Sampling for solving the stochastic multi-armed bandit problem had been open since 1933. In this paper we answer it positively for the case of Bernoulli rewards by providing the first finite-time analysis that matches the asymptotic rate given in the Lai and Robbins lower bound for the cumulative regret. The proof is accompanied by a numerical comparison with other optimal policies, experiments that have been lacking in the literature until now for the Bernoulli case.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management
