Context Clues: Evaluating Long Context Models for Clinical Prediction Tasks on EHRs
Michael Wornow, Suhana Bedi, Miguel Angel Fuentes Hernandez, Ethan, Steinberg, Jason Alan Fries, Christopher Re, Sanmi Koyejo, Nigam H. Shah

TL;DR
This paper systematically evaluates the impact of long context models, specifically Mamba-based architectures, on clinical prediction tasks using EHR data, demonstrating improved performance and robustness over shorter context models.
Contribution
It is the first to analyze the effect of extended context lengths on EHR modeling and introduces a comprehensive evaluation of model robustness to EHR-specific properties.
Findings
Longer context models improve predictive performance on EHR tasks.
Mamba-based models outperform previous state-of-the-art on most tasks.
Longer context models are more robust to EHR data peculiarities.
Abstract
Foundation Models (FMs) trained on Electronic Health Records (EHRs) have achieved state-of-the-art results on numerous clinical prediction tasks. However, most existing EHR FMs have context windows of <1k tokens. This prevents them from modeling full patient EHRs which can exceed 10k's of events. Recent advancements in subquadratic long-context architectures (e.g., Mamba) offer a promising solution. However, their application to EHR data has not been well-studied. We address this gap by presenting the first systematic evaluation of the effect of context length on modeling EHR data. We find that longer context models improve predictive performance -- our Mamba-based model surpasses the prior state-of-the-art on 9/14 tasks on the EHRSHOT prediction benchmark. For clinical applications, however, model performance alone is insufficient -- robustness to the unique properties of EHR is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗StanfordShahLab/gpt-base-512-clmbrmodel· 131 dl131 dl
- 🤗StanfordShahLab/gpt-base-1024-clmbrmodel· 2 dl2 dl
- 🤗StanfordShahLab/gpt-base-2048-clmbrmodel· 5 dl5 dl
- 🤗StanfordShahLab/gpt-base-4096-clmbrmodel· 8 dl8 dl
- 🤗StanfordShahLab/mamba-tiny-1024-clmbrmodel· 15 dl15 dl
- 🤗StanfordShahLab/mamba-tiny-4096-clmbrmodel· 5 dl5 dl
- 🤗StanfordShahLab/mamba-tiny-8192-clmbrmodel· 14 dl14 dl
- 🤗StanfordShahLab/mamba-tiny-16384-clmbrmodel· 82 dl· ♡ 182 dl♡ 1
- 🤗StanfordShahLab/llama-base-512-clmbrmodel· 7 dl7 dl
- 🤗StanfordShahLab/llama-base-1024-clmbrmodel
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Radiomics and Machine Learning in Medical Imaging
