The Elliptical Potential Lemma for General Distributions with an Application to Linear Thompson Sampling
Nima Hamidi, Mohsen Bayati

TL;DR
This paper extends the elliptical potential lemma to non-Gaussian noise and priors in linear bandits, enabling improved regret bounds for Thompson sampling with general distributions.
Contribution
It introduces a generalized elliptical potential lemma that relaxes Gaussian assumptions, broadening its applicability in sequential learning algorithms.
Findings
Provides a non-Gaussian elliptical potential lemma.
Proves an improved Bayesian regret bound for Thompson sampling.
Achieves minimax optimal regret bounds up to constants.
Abstract
In this note, we introduce a general version of the well-known elliptical potential lemma that is a widely used technique in the analysis of algorithms in sequential learning and decision-making problems. We consider a stochastic linear bandit setting where a decision-maker sequentially chooses among a set of given actions, observes their noisy rewards, and aims to maximize her cumulative expected reward over a decision-making horizon. The elliptical potential lemma is a key tool for quantifying uncertainty in estimating parameters of the reward function, but it requires the noise and the prior distributions to be Gaussian. Our general elliptical potential lemma relaxes this Gaussian requirement which is a highly non-trivial extension for a number of reasons; unlike the Gaussian case, there is no closed-form solution for the covariance matrix of the posterior distribution, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Decision-Making and Behavioral Economics · Advanced Causal Inference Techniques
