On Frequentist Regret of Linear Thompson Sampling
Nima Hamidi, Mohsen Bayati

TL;DR
This paper proves that the known frequentist regret bounds for Linear Thompson Sampling are tight due to an inherent bias, and introduces a data-driven adjustment method to achieve optimal regret.
Contribution
It establishes the fundamental limit of frequentist regret bounds for LinTS and proposes a data-driven approach to attain minimax optimality.
Findings
Frequentist regret bound for LinTS is tight at rac14;drac12; rac14;dT.
Randomization bias can cause linear regret in LinTS without inflation.
Data-driven posterior inflation adjustment can achieve optimal regret.
Abstract
This paper studies the stochastic linear bandit problem, where a decision-maker chooses actions from possibly time-dependent sets of vectors in and receives noisy rewards. The objective is to minimize regret, the difference between the cumulative expected reward of the decision-maker and that of an oracle with access to the expected reward of each action, over a sequence of decisions. Linear Thompson Sampling (LinTS) is a popular Bayesian heuristic, supported by theoretical analysis that shows its Bayesian regret is bounded by , matching minimax lower bounds. However, previous studies demonstrate that the frequentist regret bound for LinTS is , which requires posterior variance inflation and is by a factor of worse than the best optimism-based algorithms. We prove that this inflation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems
