Prior-free and prior-dependent regret bounds for Thompson Sampling
S\'ebastien Bubeck, Che-Yu Liu

TL;DR
This paper analyzes the regret bounds of Thompson Sampling in stochastic multi-armed bandits, establishing optimal bounds for prior-free and prior-dependent cases, and demonstrating its advantages with certain priors.
Contribution
It proves that Thompson Sampling achieves optimal prior-free regret bounds and shows how specific priors can lead to uniformly bounded regret over time.
Findings
Thompson Sampling attains a regret bound of 14√(nK), which is optimal.
Existence of a prior distribution where any algorithm's regret is at least (1/20)√(nK).
With certain priors, Thompson Sampling's regret is bounded uniformly over time.
Abstract
We consider the stochastic multi-armed bandit problem with a prior distribution on the reward distributions. We are interested in studying prior-free and prior-dependent regret bounds, very much in the same spirit as the usual distribution-free and distribution-dependent bounds for the non-Bayesian stochastic bandit. Building on the techniques of Audibert and Bubeck [2009] and Russo and Roy [2013] we first show that Thompson Sampling attains an optimal prior-free bound in the sense that for any prior distribution its Bayesian regret is bounded from above by . This result is unimprovable in the sense that there exists a prior distribution such that any algorithm has a Bayesian regret bounded from below by . We also study the case of priors for the setting of Bubeck et al. [2013] (where the optimal mean is known as well as a lower bound on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
