Inference for Batched Bandits
Kelly W. Zhang, Lucas Janson, Susan A. Murphy

TL;DR
This paper investigates the limitations of traditional inference methods like OLS in bandit data and introduces the Batched OLS estimator (BOLS), which provides reliable inference for data collected via bandit algorithms, even in complex scenarios.
Contribution
The paper proves the non-normality of OLS on bandit data without a unique optimal arm and introduces BOLS, a new estimator that achieves asymptotic normality in various bandit settings.
Findings
OLS is not asymptotically normal on bandit data without a unique optimal arm.
BOLS is asymptotically normal on multi-arm and contextual bandit data.
BOLS is robust to non-stationarity in baseline rewards.
Abstract
As bandit algorithms are increasingly utilized in scientific studies and industrial applications, there is an associated increasing need for reliable inference methods based on the resulting adaptively-collected data. In this work, we develop methods for inference on data collected in batches using a bandit algorithm. We first prove that the ordinary least squares estimator (OLS), which is asymptotically normal on independently sampled data, is not asymptotically normal on data collected using standard bandit algorithms when there is no unique optimal arm. This asymptotic non-normality result implies that the naive assumption that the OLS estimator is approximately normal can lead to Type-1 error inflation and confidence intervals with below-nominal coverage probabilities. Second, we introduce the Batched OLS estimator (BOLS) that we prove is (1) asymptotically normal on data collected…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Machine Learning and Algorithms
