A maximin optimal approach for sampling designs in two-phase studies
Ruoyu Wang, Qihua Wang, Wang Miao

TL;DR
This paper introduces a maximin optimal sampling design for two-phase studies that enhances estimation efficiency in model-free settings, reducing estimator variance across various scenarios.
Contribution
It proposes a novel maximin criterion for designing sampling rules based on semiparametric efficiency bounds, applicable to general estimation problems.
Findings
Reduces estimator variance in simulations
Improves efficiency bounds for scalar and multi-dimensional parameters
Demonstrates effectiveness through real data analysis
Abstract
Data collection costs can vary widely across variables in data science tasks. Two-phase designs can be employed to save data collection costs. This paper considers the two-phase studies where inexpensive variables are collected for all subjects in the first phase, and expensive variables are measured for a subsample of subjects in the second phase based on a predetermined sampling rule. The estimation efficiency under two-phase designs relies heavily on the sampling rule. Existing literature primarily focuses on designing sampling rules for estimating a scalar parameter in some parametric models or specific estimating problems. However, real-world scenarios are usually model-unknown and involve two-phase designs for model-free estimation of a scalar or multi-dimensional parameter. This paper proposes a maximin criterion to design an optimal sampling rule based on semiparametric…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptimal Experimental Design Methods · Advanced Statistical Process Monitoring · Statistical Methods in Clinical Trials
