Challenges and recommendations for Electronic Health Records data extraction and preparation for dynamic prediction modelling in hospitalized patients -- a practical guide
Elena Albu, Shan Gao, Pieter Stijnen, Frank E. Rademakers, Bas C T van, Bussel, Taya Collyer, Tina Hernandez-Boussard, Laure Wynants, Ben Van Calster

TL;DR
This paper identifies key challenges in extracting and preparing EHR data for dynamic prediction models in hospitals and offers practical recommendations to improve data quality and model reliability.
Contribution
It provides a comprehensive list of over forty challenges across four categories and offers actionable guidance for researchers and engineers working with EHR data.
Findings
Identified over forty challenges in EHR data extraction and preparation.
Organized challenges into four key categories for clarity.
Provided actionable recommendations to address these challenges.
Abstract
Dynamic predictive modelling using electronic health record (EHR) data has gained significant attention in recent years. The reliability and trustworthiness of such models depend heavily on the quality of the underlying data, which is, in part, determined by the stages preceding the model development: data extraction from EHR systems and data preparation. In this article, we identified over forty challenges encountered during these stages and provide actionable recommendations for addressing them. These challenges are organized into four categories: cohort definition, outcome definition, feature engineering, and data cleaning. This comprehensive list serves as a practical guide for data extraction engineers and researchers, promoting best practices and improving the quality and real-world applicability of dynamic prediction models in clinical settings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare
MethodsSoftmax · Attention Is All You Need
