Large Language Models are Powerful Electronic Health Record Encoders

Stefan Hegselmann; Georg von Arnim; Tillmann Rheude; Noel Kronenberg; David Sontag; Gerhard Hindricks; Roland Eils; Benjamin Wild

arXiv:2502.17403·cs.LG·April 15, 2026·3 cites

Large Language Models are Powerful Electronic Health Record Encoders

Stefan Hegselmann, Georg von Arnim, Tillmann Rheude, Noel Kronenberg, David Sontag, Gerhard Hindricks, Roland Eils, Benjamin Wild

PDF

TL;DR

This paper demonstrates that converting EHR data into natural language allows large language models to generate effective embeddings for clinical prediction, matching specialized models and offering better generalization.

Contribution

It introduces a method to use plain-text conversion of EHRs enabling LLMs to perform clinical predictions without access to private medical data.

Findings

01

LLM-based embeddings perform comparably to specialized EHR models across 15 tasks.

02

External validation shows LLM models have statistically significant improvements on some tasks.

03

Converting EHR data to text enables data-independent, portable embeddings with competitive accuracy.

Abstract

Electronic Health Records (EHRs) offer considerable potential for clinical prediction, but their complexity and heterogeneity challenge traditional machine learning. Domain-specific EHR foundation models trained on unlabeled EHR data have shown improved predictive accuracy and generalization. However, their development is constrained by limited data access and site-specific vocabularies. We convert EHR data into plain text by replacing medical codes with natural-language descriptions, enabling general-purpose Large Language Models (LLMs) to produce high-dimensional embeddings for downstream prediction tasks without access to private medical training data. LLM-based embeddings perform on par with a specialized EHR foundation model, CLMBR-T-Base, across 15 clinical tasks from the EHRSHOT benchmark. In an external validation using the UK Biobank, an LLM-based model shows statistically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.