On Frequentist Regret of Linear Thompson Sampling

Nima Hamidi; Mohsen Bayati

arXiv:2006.06790·cs.LG·April 24, 2023

On Frequentist Regret of Linear Thompson Sampling

Nima Hamidi, Mohsen Bayati

PDF

Open Access

TL;DR

This paper proves that the known frequentist regret bounds for Linear Thompson Sampling are tight due to an inherent bias, and introduces a data-driven adjustment method to achieve optimal regret.

Contribution

It establishes the fundamental limit of frequentist regret bounds for LinTS and proposes a data-driven approach to attain minimax optimality.

Findings

01

Frequentist regret bound for LinTS is tight at rac14;drac12; rac14;dT.

02

Randomization bias can cause linear regret in LinTS without inflation.

03

Data-driven posterior inflation adjustment can achieve optimal regret.

Abstract

This paper studies the stochastic linear bandit problem, where a decision-maker chooses actions from possibly time-dependent sets of vectors in $R^{d}$ and receives noisy rewards. The objective is to minimize regret, the difference between the cumulative expected reward of the decision-maker and that of an oracle with access to the expected reward of each action, over a sequence of $T$ decisions. Linear Thompson Sampling (LinTS) is a popular Bayesian heuristic, supported by theoretical analysis that shows its Bayesian regret is bounded by $O (d T)$ , matching minimax lower bounds. However, previous studies demonstrate that the frequentist regret bound for LinTS is $O (d d T)$ , which requires posterior variance inflation and is by a factor of $d$ worse than the best optimism-based algorithms. We prove that this inflation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems