Adaptable Cardiovascular Disease Risk Prediction from Heterogeneous Data using Large Language Models

Frederike L\"ubeck; Jonas Wildberger; Frederik Tr\"auble; Maximilian Mordig; Sergios Gatidis; Andreas Krause; Bernhard Sch\"olkopf

arXiv:2505.24655·cs.AI·June 2, 2025

Adaptable Cardiovascular Disease Risk Prediction from Heterogeneous Data using Large Language Models

Frederike L\"ubeck, Jonas Wildberger, Frederik Tr\"auble, Maximilian Mordig, Sergios Gatidis, Andreas Krause, Bernhard Sch\"olkopf

PDF

TL;DR

AdaCVD is a novel large language model-based framework that improves cardiovascular risk prediction by flexibly integrating diverse patient data, adapting to new populations, and outperforming existing models in real-world clinical settings.

Contribution

It introduces AdaCVD, the first adaptable LLM-based CVD risk prediction model that handles heterogeneous data and distribution shifts in clinical practice.

Findings

01

Surpasses established risk scores and machine learning models in benchmarks.

02

Demonstrates robust performance across diverse demographic and clinical subgroups.

03

Effectively integrates structured data and unstructured text for comprehensive risk assessment.

Abstract

Cardiovascular disease (CVD) risk prediction models are essential for identifying high-risk individuals and guiding preventive actions. However, existing models struggle with the challenges of real-world clinical practice as they oversimplify patient profiles, rely on rigid input schemas, and are sensitive to distribution shifts. We developed AdaCVD, an adaptable CVD risk prediction framework built on large language models extensively fine-tuned on over half a million participants from the UK Biobank. In benchmark comparisons, AdaCVD surpasses established risk scores and standard machine learning approaches, achieving state-of-the-art performance. Crucially, for the first time, it addresses key clinical challenges across three dimensions: it flexibly incorporates comprehensive yet variable patient information; it seamlessly integrates both structured data and unstructured text; and it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.