Automating Early Disease Prediction Via Structured and Unstructured Clinical Data
Ane G Domingo-Aldama, Marcos Merino Prado, Alain Garc\'ia Olea, Josu Goikoetxea, Koldo Gojenola, Aitziber Atutxa

TL;DR
This paper introduces an automated method that uses NLP to extract information from discharge reports, improving early disease prediction accuracy by enriching structured EHR data.
Contribution
The study presents a fully automated pipeline leveraging unstructured clinical reports to enhance early prediction models, addressing missing data issues in electronic health records.
Findings
Models trained on report-enriched data outperform those using only structured EHR data.
The approach improves prediction accuracy for atrial fibrillation progression.
Automating report processing streamlines early prediction studies and enhances data quality.
Abstract
This study presents a fully automated methodology for early prediction studies in clinical settings, leveraging information extracted from unstructured discharge reports. The proposed pipeline uses discharge reports to support the three main steps of early prediction: cohort selection, dataset generation, and outcome labeling. By processing discharge reports with natural language processing techniques, we can efficiently identify relevant patient cohorts, enrich structured datasets with additional clinical variables, and generate high-quality labels without manual intervention. This approach addresses the frequent issue of missing or incomplete data in codified electronic health records (EHR), capturing clinically relevant information that is often underrepresented. We evaluate the methodology in the context of predicting atrial fibrillation (AF) progression, showing that predictive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
