Optimal two-phase sampling designs for generalized raking estimators with multiple parameters of interest
Jasper B. Yang, Bryan E. Shepherd, Thomas Lumley, Pamela A. Shaw

TL;DR
This paper develops optimal adaptive two-phase sampling designs for generalized raking estimators with multiple parameters, improving efficiency in large observational studies with missing or error-prone data.
Contribution
It extends existing methods by deriving multiwave, adaptive sampling strategies for multiple parameters, and compares their performance to traditional approaches.
Findings
Optimized designs improve estimator efficiency over case-control sampling.
Integer-valued A-optimal allocation outperforms independent optimization.
Designs for GR differ from IPW, affecting efficiency in multi-parameter settings.
Abstract
Large observational datasets, including those derived from electronic health records, are a valuable resource for medical research but are often affected by missingness, measurement error, and misclassification. Two-phase sampling with generalized raking (GR) estimation is an efficient and robust approach to statistical inference in such settings. In this approach, variables that are unavailable or measured with error in a large phase 1 cohort are obtained with higher-quality measurements in a phase 2 subsample. Previous research has studied optimal phase 2 sampling designs for inverse probability weighted (IPW) estimators in non-adaptive, multi-parameter settings, and for GR estimators in single-parameter settings. In this work, we extend these results by deriving optimal adaptive, multiwave sampling designs for IPW and GR estimators when multiple parameters are of interest. We propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSurvey Sampling and Estimation Techniques · Advanced Statistical Process Monitoring
