Prior-free and prior-dependent regret bounds for Thompson Sampling

S\'ebastien Bubeck; Che-Yu Liu

arXiv:1304.5758·stat.ML·October 4, 2013

Prior-free and prior-dependent regret bounds for Thompson Sampling

S\'ebastien Bubeck, Che-Yu Liu

PDF

TL;DR

This paper analyzes the regret bounds of Thompson Sampling in stochastic multi-armed bandits, establishing optimal bounds for prior-free and prior-dependent cases, and demonstrating its advantages with certain priors.

Contribution

It proves that Thompson Sampling achieves optimal prior-free regret bounds and shows how specific priors can lead to uniformly bounded regret over time.

Findings

01

Thompson Sampling attains a regret bound of 14√(nK), which is optimal.

02

Existence of a prior distribution where any algorithm's regret is at least (1/20)√(nK).

03

With certain priors, Thompson Sampling's regret is bounded uniformly over time.

Abstract

We consider the stochastic multi-armed bandit problem with a prior distribution on the reward distributions. We are interested in studying prior-free and prior-dependent regret bounds, very much in the same spirit as the usual distribution-free and distribution-dependent bounds for the non-Bayesian stochastic bandit. Building on the techniques of Audibert and Bubeck [2009] and Russo and Roy [2013] we first show that Thompson Sampling attains an optimal prior-free bound in the sense that for any prior distribution its Bayesian regret is bounded from above by $14 n K$ . This result is unimprovable in the sense that there exists a prior distribution such that any algorithm has a Bayesian regret bounded from below by $\frac{1}{20} n K$ . We also study the case of priors for the setting of Bubeck et al. [2013] (where the optimal mean is known as well as a lower bound on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.