Thompson Sampling for Online Learning with Linear Experts
Aditya Gopalan

TL;DR
This paper introduces a Thompson sampling algorithm for online linear learning with full information, demonstrating that it achieves sqrt(T) regret bounds by connecting it to a known Follow-the-Perturbed-Leader strategy.
Contribution
It adapts Thompson sampling with Gaussian priors and likelihoods to the online linear experts setting, providing regret bounds and linking it to existing algorithms.
Findings
Thompson sampling with Gaussian noise achieves sqrt(T) regret.
The algorithm reduces to Follow-the-Perturbed-Leader with Gaussian perturbations.
Provides theoretical regret bounds for the proposed method.
Abstract
In this note, we present a version of the Thompson sampling algorithm for the problem of online linear generalization with full information (i.e., the experts setting), studied by Kalai and Vempala, 2005. The algorithm uses a Gaussian prior and time-varying Gaussian likelihoods, and we show that it essentially reduces to Kalai and Vempala's Follow-the-Perturbed-Leader strategy, with exponentially distributed noise replaced by Gaussian noise. This implies sqrt(T) regret bounds for Thompson sampling (with time-varying likelihood) for online learning with full information.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Machine Learning and Algorithms
