TL;DR
This paper investigates optimal two-phase sampling designs for regression analysis using design-based estimators, particularly generalized raking, to improve efficiency under resource constraints.
Contribution
It derives a closed-form solution for the optimal design for generalized raking estimators and compares it with the IPW estimator's optimal design.
Findings
Optimal designs differ significantly between IPW and generalized raking estimators.
IPW-based optimal design is nearly optimal for generalized raking in most cases.
Potential for efficiency improvement exists with tailored designs for each estimator.
Abstract
Two-phase designs measure variables of interest on a subcohort where the outcome and covariates are readily available or cheap to collect on all individuals in the cohort. Given limited resource availability, it is of interest to find an optimal design that includes more informative individuals in the final sample. We explore the optimal designs and efficiencies for analysis by design-based estimators. Generalized raking is an efficient design-based estimator that improves on the inverse-probability weighted (IPW) estimator by adjusting weights based on the auxiliary information. We derive a closed-form solution of the optimal design for estimating regression coefficients from generalized raking estimators. We compare it with the optimal design for analysis via the IPW estimator and other two-phase designs in measurement-error settings. We consider general two-phase designs where the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
