Predicting Post-Traumatic Epilepsy from Clinical Records using Large Language Model Embeddings
Wenhui Cui, Nicholas Swingle, Anand A. Joshi, Dileep Nair, Richard M. Leahy

TL;DR
This study shows that large language model embeddings from routine clinical records can effectively predict post-traumatic epilepsy early, offering a resource-efficient alternative to neuroimaging.
Contribution
The paper introduces a novel approach using pretrained LLMs as feature extractors for early PTE prediction from clinical records, improving performance over traditional tabular data methods.
Findings
LLM embeddings outperform tabular features in predictive accuracy.
Fusion of tabular data and LLM embeddings yields the highest AUC-ROC of 0.892.
Key predictors include acute seizures, injury severity, and ICU stay.
Abstract
Objective: Post-traumatic epilepsy (PTE) is a debilitating neurological disorder that develops after traumatic brain injury (TBI). Early prediction of PTE remains challenging due to heterogeneous clinical data, limited positive cases, and reliance on resource-intensive neuroimaging data. We investigate whether routinely collected acute clinical records alone can support early PTE prediction using language model-based approaches. Methods: Using a curated subset of the TRACK-TBI cohort, we developed an automated PTE prediction framework that implements pretrained large language models (LLMs) as fixed feature extractors to encode clinical records. Tabular features, LLM-generated embeddings, and hybrid feature representations were evaluated using gradient-boosted tree classifiers under stratified cross-validation. Results: LLM embeddings achieved performance improvements by capturing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
