Overcoming data challenges through enriched validation and targeted sampling to measure whole-person health in electronic health records
Sarah C. Lotspeich, Sheetal Kedar, Rabeya Tahir, Aidan D. Keleghan, Amelia Miranda, Stephany N. Duda, Michael P. Bancks, Brian J. Wells, Ashish K. Khanna, Joseph Rigdon

TL;DR
This study develops methods to improve the measurement of whole-person health using electronic health records by addressing data missingness and errors through targeted validation and robust statistical modeling.
Contribution
It introduces a novel validation protocol and sampling strategy, combined with semiparametric estimation, to enhance data quality and analysis of the allostatic load index in EHRs.
Findings
Validation increased non-missing ALI components from 6 to 7 per patient.
Residual sampling was most effective for validation.
Higher ALI was associated with increased healthcare utilization.
Abstract
The allostatic load index (ALI) is a 10-component measure of whole-person health. Data from electronic health records (EHR) present a huge opportunity to operationalize the ALI in learning health systems; however, these data are prone to missingness and errors. Validation (e.g., through chart reviews) provides better-quality data, but realistically, only a subset of patients' data can be validated, and most protocols do not recover missing data. Using a representative sample of 1000 patients from the EHR at an extensive learning health system (100 of whom could be validated), we propose methods to design, conduct, and analyze statistically efficient and robust studies of ALI and healthcare utilization. Employing semiparametric maximum likelihood estimation, we robustly incorporate all available patient information into statistical models. Using targeted design strategies, we examine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Cardiovascular Health and Risk Factors · Artificial Intelligence in Healthcare
