M-estimation under Two-Phase Multiwave Sampling with Applications to Prediction-Powered Inference
Dan M. Kluger, Stephen Bates

TL;DR
This paper develops valid statistical estimators and confidence intervals for two-phase multiwave sampling, leveraging machine learning proxies to improve efficiency and reduce bias in adaptive sampling scenarios.
Contribution
It introduces a novel Multiwave Predict-Then-Debias estimator that combines proxies with expensive measurements, along with an efficient sampling strategy and theoretical guarantees.
Findings
The proposed estimator is asymptotically normal and unbiased.
Simulation studies show significant efficiency gains over traditional methods.
The sampling strategy improves data collection efficiency in practice.
Abstract
In two-phase multiwave sampling, inexpensive measurements are collected on a large sample and expensive, more informative measurements are adaptively obtained on subsets of units across multiple waves. Adaptively collecting the expensive measurements can increase efficiency but complicates statistical inference. We give valid estimators and confidence intervals for M-estimation under adaptive two-phase multiwave sampling. We focus on the case where proxies for the expensive variables -- such as predictions from pretrained machine learning models -- are available for all units and propose a Multiwave Predict-Then-Debias estimator that combines proxy information with the expensive, higher-quality measurements to improve efficiency while removing bias. We establish asymptotic linearity and normality and propose asymptotically valid confidence intervals. We also develop an approximately…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSurvey Sampling and Estimation Techniques · Statistical Methods and Bayesian Inference · SARS-CoV-2 detection and testing
