Prior Diffusiveness and Regret in the Linear-Gaussian Bandit
Yifan Zhu, John C. Duchi, Benjamin Van Roy

TL;DR
This paper analyzes the Bayesian regret of Thompson sampling in linear-Gaussian bandits, revealing a decoupled prior-dependent term and introducing a new elliptical potential lemma.
Contribution
It provides a novel regret bound showing additive decoupling of prior and long-term regret, and introduces an elliptical potential lemma for analysis.
Findings
Bayesian regret bound with decoupled prior term
Introduction of elliptical potential lemma
Lower bound showing burn-in term is unavoidable
Abstract
We prove that Thompson sampling exhibits Bayesian regret in the linear-Gaussian bandit with a prior distribution on the coefficients, where is the dimension, is the time horizon, is the maximum norm of the actions, and is the noise variance. In contrast to existing regret bounds, this shows that to within logarithmic factors, the prior-dependent ``burn-in'' term decouples additively from the minimax (long run) regret . Previous regret bounds exhibit a multiplicative dependence on these terms. We establish these results via a new ``elliptical potential'' lemma, and also provide a lower bound indicating that the burn-in term is unavoidable.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference
