Optimal multi-wave sampling for regression modelling in two-phase designs
Tong Chen, Thomas Lumley

TL;DR
This paper introduces a multi-wave sampling method for two-phase regression designs, using influence functions and informative priors to enhance efficiency and approximate optimality in parameter estimation.
Contribution
It proposes a novel multi-wave sampling approach utilizing influence functions and priors to improve the efficiency of two-phase regression designs.
Findings
Two-wave sampling with informative priors improves parameter precision.
The proposed method approximates the optimal design effectively.
Generalised raking enhances statistical analysis in the proposed framework.
Abstract
Two-phase designs involve measuring extra variables on a subset of the cohort where some variables are already measured. The goal of two-phase designs is to choose a subsample of individuals from the cohort and analyse that subsample efficiently. It is of interest to obtain an optimal design that gives the most efficient estimates of regression parameters. In this paper, we propose a multi-wave sampling design to approximate the optimal design for design-based estimators. Influences functions are used to compute the optimal sampling allocations. We propose to use informative priors on regression parameters to derive the wave-1 sampling probabilities because any pre-specified sampling probabilities may be far from optimal and decrease efficiency. Generalised raking is used in statistical analysis. We show that a two-wave sampling with reasonable informative priors will end up with higher…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
