Application of Clinical Concept Embeddings for Heart Failure Prediction in UK EHR data
Spiros Denaxas, Pontus Stenetorp, Sebastian Riedel, Maria Pikoula,, Richard Dobson, Harry Hemingway

TL;DR
This study leverages global vector embeddings of diagnoses and procedures from UK EHR data to improve heart failure risk prediction, demonstrating that embeddings can enhance model robustness and reduce manual feature engineering.
Contribution
It introduces the use of GloVe embeddings for EHR data to improve disease risk prediction models, addressing challenges of high dimensionality and heterogeneity.
Findings
Embeddings improved heart failure risk prediction accuracy.
Embeddings reduced reliance on manual feature engineering.
Demonstrated robustness of EHR-derived models.
Abstract
Electronic health records (EHR) are increasingly being used for constructing disease risk prediction models. Feature engineering in EHR data however is challenging due to their highly dimensional and heterogeneous nature. Low-dimensional representations of EHR data can potentially mitigate these challenges. In this paper, we use global vectors (GloVe) to learn word embeddings for diagnoses and procedures recorded using 13 million ontology terms across 2.7 million hospitalisations in national UK EHR. We demonstrate the utility of these embeddings by evaluating their performance in identifying patients which are at higher risk of being hospitalised for congestive heart failure. Our findings indicate that embeddings can enable the creation of robust EHR-derived disease risk prediction models and address some the limitations associated with manual clinical feature engineering.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Artificial Intelligence in Healthcare · Biomedical Text Mining and Ontologies
