Exploring Long-Term Prediction of Type 2 Diabetes Microvascular Complications
Elizabeth Remfry, Rafael Henkin, Michael R Barnes, Aakanksha Naik

TL;DR
This study evaluates a code-agnostic, language model-based approach for long-term prediction of microvascular complications in Type 2 Diabetes patients using UK EHR data, showing improved performance over traditional code-based models.
Contribution
It introduces a novel text-based, code-agnostic modeling method for predicting diabetes complications, emphasizing the importance of context length in model performance.
Findings
Code-agnostic models outperform code-based models.
Prediction accuracy improves with longer time windows.
Performance is biased towards earlier complications.
Abstract
Electronic healthcare records (EHR) contain a huge wealth of data that can support the prediction of clinical outcomes. EHR data is often stored and analysed using clinical codes (ICD10, SNOMED), however these can differ across registries and healthcare providers. Integrating data across systems involves mapping between different clinical ontologies requiring domain expertise, and at times resulting in data loss. To overcome this, code-agnostic models have been proposed. We assess the effectiveness of a code-agnostic representation approach on the task of long-term microvascular complication prediction for individuals living with Type 2 Diabetes. Our method encodes individual EHRs as text using fine-tuned, pretrained clinical language models. Leveraging large-scale EHR data from the UK, we employ a multi-label approach to simultaneously predict the risk of microvascular complications…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare · Machine Learning in Healthcare
