The impact of electronic health records (EHR) data continuity on prediction model fairness and racial-ethnic disparities
Yu Huang, Jingchuan Guo, Zhaoyi Chen, Jie Xu, William T Donahoo, Olveen Carasquillo, Hrushyang Adloori, Jiang Bian, Elizabeth A Shenkman

TL;DR
This study examines how variability in EHR data completeness impacts prediction model fairness and racial-ethnic disparities, emphasizing the importance of data continuity for valid and equitable health predictions.
Contribution
It provides a comprehensive analysis of how EHR data-discontinuity influences bias, model performance, and fairness across racial and ethnic groups in health prediction models.
Findings
Higher EHR data continuity improves prediction utility.
Models trained on high continuity data perform worse on low continuity data.
Disparities in model fairness vary with data continuity levels.
Abstract
Electronic health records (EHR) data have considerable variability in data completeness across sites and patients. Lack of "EHR data-continuity" or "EHR data-discontinuity", defined as "having medical information recorded outside the reach of an EHR system" can lead to a substantial amount of information bias. The objective of this study was to comprehensively evaluate (1) how EHR data-discontinuity introduces data bias, (2) case finding algorithms affect downstream prediction models, and (3) how algorithmic fairness is associated with racial-ethnic disparities. We leveraged our EHRs linked with Medicaid and Medicare claims data in the OneFlorida+ network and used a validated measure (i.e., Mean Proportions of Encounters Captured [MPEC]) to estimate patients' EHR data continuity. We developed a machine learning model for predicting type 2 diabetes (T2D) diagnosis as the use case for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Artificial Intelligence in Healthcare · Diabetes Management and Education
