Prior Diffusiveness and Regret in the Linear-Gaussian Bandit

Yifan Zhu; John C. Duchi; Benjamin Van Roy

arXiv:2601.02022·cs.LG·January 6, 2026

Prior Diffusiveness and Regret in the Linear-Gaussian Bandit

Yifan Zhu, John C. Duchi, Benjamin Van Roy

PDF

Open Access

TL;DR

This paper analyzes the Bayesian regret of Thompson sampling in linear-Gaussian bandits, revealing a decoupled prior-dependent term and introducing a new elliptical potential lemma.

Contribution

It provides a novel regret bound showing additive decoupling of prior and long-term regret, and introduces an elliptical potential lemma for analysis.

Findings

01

Bayesian regret bound with decoupled prior term

02

Introduction of elliptical potential lemma

03

Lower bound showing burn-in term is unavoidable

Abstract

We prove that Thompson sampling exhibits $\tilde{O} (σ d T + d r Tr (Σ_{0}))$ Bayesian regret in the linear-Gaussian bandit with a $N (μ_{0}, Σ_{0})$ prior distribution on the coefficients, where $d$ is the dimension, $T$ is the time horizon, $r$ is the maximum $ℓ_{2}$ norm of the actions, and $σ^{2}$ is the noise variance. In contrast to existing regret bounds, this shows that to within logarithmic factors, the prior-dependent ``burn-in'' term $d r Tr (Σ_{0})$ decouples additively from the minimax (long run) regret $σ d T$ . Previous regret bounds exhibit a multiplicative dependence on these terms. We establish these results via a new ``elliptical potential'' lemma, and also provide a lower bound indicating that the burn-in term is unavoidable.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference