Predicting Post-Traumatic Epilepsy from Clinical Records using Large Language Model Embeddings

Wenhui Cui; Nicholas Swingle; Anand A. Joshi; Dileep Nair; Richard M. Leahy

arXiv:2604.14547·cs.LG·April 17, 2026

Predicting Post-Traumatic Epilepsy from Clinical Records using Large Language Model Embeddings

Wenhui Cui, Nicholas Swingle, Anand A. Joshi, Dileep Nair, Richard M. Leahy

PDF

TL;DR

This study shows that large language model embeddings from routine clinical records can effectively predict post-traumatic epilepsy early, offering a resource-efficient alternative to neuroimaging.

Contribution

The paper introduces a novel approach using pretrained LLMs as feature extractors for early PTE prediction from clinical records, improving performance over traditional tabular data methods.

Findings

01

LLM embeddings outperform tabular features in predictive accuracy.

02

Fusion of tabular data and LLM embeddings yields the highest AUC-ROC of 0.892.

03

Key predictors include acute seizures, injury severity, and ICU stay.

Abstract

Objective: Post-traumatic epilepsy (PTE) is a debilitating neurological disorder that develops after traumatic brain injury (TBI). Early prediction of PTE remains challenging due to heterogeneous clinical data, limited positive cases, and reliance on resource-intensive neuroimaging data. We investigate whether routinely collected acute clinical records alone can support early PTE prediction using language model-based approaches. Methods: Using a curated subset of the TRACK-TBI cohort, we developed an automated PTE prediction framework that implements pretrained large language models (LLMs) as fixed feature extractors to encode clinical records. Tabular features, LLM-generated embeddings, and hybrid feature representations were evaluated using gradient-boosted tree classifiers under stratified cross-validation. Results: LLM embeddings achieved performance improvements by capturing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.