Thompson sampling: Precise arm-pull dynamics and adaptive inference
Qiyang Han

TL;DR
This paper investigates the detailed dynamics of arm-pull counts in Thompson sampling algorithms, revealing a dichotomy in stability and enabling new inference methods for both stable and unstable arms.
Contribution
It uncovers the precise asymptotic behavior of arm-pull counts in Thompson sampling, contrasting with UCB algorithms, and introduces novel approaches for inference in these regimes.
Findings
Stable arms have Gaussian limits for normalized means.
Unstable arms exhibit non-Gaussian limits, enabling inference beyond stability.
A unifying principle links arm stability to the interaction with statistical noise.
Abstract
Adaptive sampling schemes are well known to create complex dependence that may invalidate conventional inference methods. A recent line of work shows that this need not be the case for UCB-type algorithms in multi-armed bandits. A central emerging theme is a `stability' property with asymptotically deterministic arm-pull counts in these algorithms, making inference as easy as in the i.i.d. setting. In this paper, we study the precise arm-pull dynamics in another canonical class of Thompson-sampling type algorithms. We show that the phenomenology is qualitatively different: the arm-pull count is asymptotically deterministic if and only if the arm is suboptimal or is the unique optimal arm; otherwise it converges in distribution to the unique invariant law of an SDE. This dichotomy uncovers a unifying principle behind many existing (in)stability results: an arm is stable if and only if…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference · Machine Learning and Algorithms
