Statistical Inference under Adaptive Sampling with LinUCB
Wei Fan, Kevin Tan, Yuting Wei

TL;DR
This paper proves that the LinUCB algorithm in linear bandits satisfies stability, leading to asymptotic normality of estimators and enabling the construction of tighter confidence sets and hypothesis tests in adaptive sampling.
Contribution
It establishes a central limit theorem for LinUCB, characterizing the eigenstructure of the covariance matrix and deriving asymptotically valid confidence sets.
Findings
LinUCB satisfies the stability property.
Eigenvalues and eigenvectors of the covariance matrix are characterized.
Asymptotic normality of the estimation error is proven.
Abstract
Adaptively collected data has become ubiquitous within modern practice. However, even seemingly benign adaptive sampling schemes can introduce severe biases, rendering traditional statistical inference tools inapplicable. This can be mitigated by a property called stability, which states that if the rate at which an algorithm takes actions converges to a deterministic limit, one can expect that certain parameters are asymptotically normal. Building on a recent line of work for the multi-armed bandit setting, we show that the linear upper confidence bound (LinUCB) algorithm for linear bandits satisfies this property. In doing so, we painstakingly characterize the behavior of the eigenvalues and eigenvectors of the random design feature covariance matrix in the setting where the action set is the unit ball, showing that it decomposes into a rank-one direction that locks onto the true…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Machine Learning and Algorithms
