On Thompson Sampling with Langevin Algorithms

Eric Mazumdar; Aldo Pacchiano; Yi-an Ma; Peter L. Bartlett; Michael I.; Jordan

arXiv:2002.10002·cs.LG·June 19, 2020·6 cites

On Thompson Sampling with Langevin Algorithms

Eric Mazumdar, Aldo Pacchiano, Yi-an Ma, Peter L. Bartlett, Michael I., Jordan

PDF

Open Access 1 Video

TL;DR

This paper introduces Langevin-based MCMC algorithms to improve the computational efficiency of Thompson sampling in multi-armed bandit problems, achieving logarithmic regret with low complexity.

Contribution

It develops Langevin algorithms with convergence guarantees for approximate posterior sampling in Thompson sampling, reducing computational costs significantly.

Findings

01

Algorithms achieve logarithmic regret.

02

Computational complexity is independent of time horizon.

03

Only a constant number of iterations and data are needed per round.

Abstract

Thompson sampling for multi-armed bandit problems is known to enjoy favorable performance in both theory and practice. However, it suffers from a significant limitation computationally, arising from the need for samples from posterior distributions at every iteration. We propose two Markov Chain Monte Carlo (MCMC) methods tailored to Thompson sampling to address this issue. We construct quickly converging Langevin algorithms to generate approximate samples that have accuracy guarantees, and we leverage novel posterior concentration rates to analyze the regret of the resulting approximate Thompson sampling algorithm. Further, we specify the necessary hyperparameters for the MCMC procedure to guarantee optimal instance-dependent frequentist regret while having low computational complexity. In particular, our algorithms take advantage of both posterior concentration and a sample reuse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

On Thompson Sampling with Langevin Algorithms· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Sparse and Compressive Sensing Techniques · Bayesian Methods and Mixture Models