On Thompson Sampling with Langevin Algorithms
Eric Mazumdar, Aldo Pacchiano, Yi-an Ma, Peter L. Bartlett, Michael I., Jordan

TL;DR
This paper introduces Langevin-based MCMC algorithms to improve the computational efficiency of Thompson sampling in multi-armed bandit problems, achieving logarithmic regret with low complexity.
Contribution
It develops Langevin algorithms with convergence guarantees for approximate posterior sampling in Thompson sampling, reducing computational costs significantly.
Findings
Algorithms achieve logarithmic regret.
Computational complexity is independent of time horizon.
Only a constant number of iterations and data are needed per round.
Abstract
Thompson sampling for multi-armed bandit problems is known to enjoy favorable performance in both theory and practice. However, it suffers from a significant limitation computationally, arising from the need for samples from posterior distributions at every iteration. We propose two Markov Chain Monte Carlo (MCMC) methods tailored to Thompson sampling to address this issue. We construct quickly converging Langevin algorithms to generate approximate samples that have accuracy guarantees, and we leverage novel posterior concentration rates to analyze the regret of the resulting approximate Thompson sampling algorithm. Further, we specify the necessary hyperparameters for the MCMC procedure to guarantee optimal instance-dependent frequentist regret while having low computational complexity. In particular, our algorithms take advantage of both posterior concentration and a sample reuse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Sparse and Compressive Sensing Techniques · Bayesian Methods and Mixture Models
