Two-phase validation sampling via principal components to improve efficiency in multi-model estimation from error-prone biomedical databases
Sarah C. Lotspeich, Cole Manschot

TL;DR
This paper introduces a principal component-based two-phase sampling method to efficiently validate error-prone data in biomedical databases, improving multi-model estimation accuracy.
Contribution
It extends extreme tail sampling by using principal components to balance multiple models, enhancing efficiency in validation studies with multiple exposures.
Findings
Strategy improves efficiency across multiple models in simulations.
Application to NHANES data demonstrates practical benefits.
Method remains effective with correlated or heterogeneous errors.
Abstract
Two-phase sampling offers a cost-effective way to validate error-prone covariate measurements in biomedical databases. Inexpensive or easy-to-obtain information is collected for the entire study in Phase I. Then, a subset of patients undergoes cost-intensive validation (e.g., expert chart review) to collect more accurate data in Phase II. When balancing primary and secondary analyses, competing models and priorities can result in poorly defined objectives for the most informative Phase II sampling criterion. Extreme tail sampling (ETS), wherein patients with the smallest and largest values of a particular quantity (like a covariate or residual) are selected, can offer great statistical efficiency in two-phase studies when focusing on a single analytic objective by targeting observations with the biggest contributions to the Fisher information. We propose an intuitive, easy-to-use…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Bayesian Inference · Advanced Causal Inference Techniques · Statistical Methods and Inference
