Thompson Sampling for Online Learning with Linear Experts

Aditya Gopalan

arXiv:1311.0468·stat.ML·November 5, 2013·1 cites

Thompson Sampling for Online Learning with Linear Experts

Aditya Gopalan

PDF

Open Access

TL;DR

This paper introduces a Thompson sampling algorithm for online linear learning with full information, demonstrating that it achieves sqrt(T) regret bounds by connecting it to a known Follow-the-Perturbed-Leader strategy.

Contribution

It adapts Thompson sampling with Gaussian priors and likelihoods to the online linear experts setting, providing regret bounds and linking it to existing algorithms.

Findings

01

Thompson sampling with Gaussian noise achieves sqrt(T) regret.

02

The algorithm reduces to Follow-the-Perturbed-Leader with Gaussian perturbations.

03

Provides theoretical regret bounds for the proposed method.

Abstract

In this note, we present a version of the Thompson sampling algorithm for the problem of online linear generalization with full information (i.e., the experts setting), studied by Kalai and Vempala, 2005. The algorithm uses a Gaussian prior and time-varying Gaussian likelihoods, and we show that it essentially reduces to Kalai and Vempala's Follow-the-Perturbed-Leader strategy, with exponentially distributed noise replaced by Gaussian noise. This implies sqrt(T) regret bounds for Thompson sampling (with time-varying likelihood) for online learning with full information.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Machine Learning and Algorithms