Linear Thompson Sampling Revisited

Marc Abeille; Alessandro Lazaric

arXiv:1611.06534·stat.ML·November 6, 2019

Linear Thompson Sampling Revisited

Marc Abeille, Alessandro Lazaric

PDF

TL;DR

This paper provides a new proof for the regret bounds of Thompson sampling in stochastic linear bandits, revealing its relation to optimism and sensitivity, and extends the analysis to related models.

Contribution

It offers an alternative proof of regret bounds for Thompson sampling, highlighting its connection to optimism and extending the approach to generalized models.

Findings

01

Regret bound of order ^{3/2} extstyle{ ilde{O}}(d^{3/2}\sqrt{T}) established

02

Thompson sampling can be viewed as a randomized algorithm with a fixed optimism probability

03

Proof technique applicable to regularized linear and generalized linear models

Abstract

We derive an alternative proof for the regret of Thompson sampling (\ts) in the stochastic linear bandit setting. While we obtain a regret bound of order $O (d^{3/2} T)$ as in previous results, the proof sheds new light on the functioning of the \ts. We leverage on the structure of the problem to show how the regret is related to the sensitivity (i.e., the gradient) of the objective function and how selecting optimal arms associated to \textit{optimistic} parameters does control it. Thus we show that \ts can be seen as a generic randomized algorithm where the sampling distribution is designed to have a fixed probability of being optimistic, at the cost of an additional $d$ regret factor compared to a UCB-like approach. Furthermore, we show that our proof can be readily applied to regularized linear optimization and generalized linear model problems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.