Patient Recruitment Using Electronic Health Records Under Selection Bias: a Two-phase Sampling Framework
Guanghao Zhang, Lauren J. Beesley, Bhramar Mukherjee, Xu Shi

TL;DR
This paper introduces an optimal two-phase sampling framework utilizing electronic health records to improve patient recruitment efficiency for clinical studies, while addressing selection bias inherent in EHR data.
Contribution
It develops a novel two-phase sampling method that accounts for selection bias and leverages auxiliary covariates in EHR data to enhance study efficiency.
Findings
Efficiency gains demonstrated through simulation studies.
Application to hypertension prevalence estimation in US adults.
Method effectively adjusts for EHR selection bias.
Abstract
Electronic health records (EHRs) are increasingly recognized as a cost-effective resource for patient recruitment in clinical research. However, how to optimally select a cohort from millions of individuals to answer a scientific question of interest remains unclear. Consider a study to estimate the mean or mean difference of an expensive outcome. Inexpensive auxiliary covariates predictive of the outcome may often be available in patients' health records, presenting an opportunity to recruit patients selectively which may improve efficiency in downstream analyses. In this paper, we propose a two-phase sampling design that leverages available information on auxiliary covariates in EHR data. A key challenge in using EHR data for multi-phase sampling is the potential selection bias, because EHR data are not necessarily representative of the target population. Extending existing literature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Healthcare Policy and Management · Statistical Methods and Inference
