Optimism Stabilizes Thompson Sampling for Adaptive Inference

Shunxing Yan; Han Zhong

arXiv:2602.06014·cs.LG·February 6, 2026

Optimism Stabilizes Thompson Sampling for Adaptive Inference

Shunxing Yan, Han Zhong

PDF

Open Access

TL;DR

This paper demonstrates that optimism in Thompson sampling stabilizes its inferential properties in multi-armed bandits, allowing for valid asymptotic inference with minimal regret increase.

Contribution

It extends the stability results of variance-inflated Thompson sampling from two-armed to K-armed bandits and introduces an alternative optimistic modification that also ensures stability.

Findings

01

Variance-inflated TS is stable for any number of arms.

02

Optimistic modifications enable valid asymptotic inference.

03

Stability is achieved with only mild regret cost.

Abstract

Thompson sampling (TS) is widely used for stochastic multi-armed bandits, yet its inferential properties under adaptive data collection are subtle. Classical asymptotic theory for sample means can fail because arm-specific sample sizes are random and coupled with the rewards through the action-selection rule. We study this phenomenon in the $K$ -armed Gaussian bandit and identify \emph{optimism} as a key mechanism for restoring \emph{stability}, a sufficient condition for valid asymptotic inference requiring each arm's pull count to concentrate around a deterministic scale. First, we prove that variance-inflated TS \citep{halder2025stable} is stable for any $K \geq 2$ , including the challenging regime where multiple arms are optimal. This resolves the open question raised by \citet{halder2025stable} through extending their results from the two-armed setting to the general $K$ -armed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference