Statistical Inference with M-Estimators on Adaptively Collected Data
Kelly W. Zhang, Lucas Janson, and Susan A. Murphy

TL;DR
This paper develops a theoretical framework for using M-estimators with adaptively collected data, such as from bandit algorithms, to produce valid confidence intervals for complex models like logistic regression.
Contribution
It introduces adaptive weighting schemes for M-estimators that enable valid statistical inference on data from bandit algorithms, extending beyond simple mean comparisons.
Findings
M-estimators can be adapted for bandit data with proper weighting.
The method provides asymptotically valid confidence regions.
Applicable to complex models like logistic regression.
Abstract
Bandit algorithms are increasingly used in real-world sequential decision-making problems. Associated with this is an increased desire to be able to use the resulting datasets to answer scientific questions like: Did one type of ad lead to more purchases? In which contexts is a mobile health intervention effective? However, classical statistical approaches fail to provide valid confidence intervals when used with data collected with bandit algorithms. Alternative methods have recently been developed for simple models (e.g., comparison of means). Yet there is a lack of general methods for conducting statistical inference using more complex models on data collected with (contextual) bandit algorithms; for example, current methods cannot be used for valid inference on parameters in a logistic regression model for a binary reward. In this work, we develop theory justifying the use of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Decision-Making and Behavioral Economics · Data Stream Mining Techniques
MethodsLogistic Regression
