The Patient is not a Moving Document: A World Model Training Paradigm for Longitudinal EHR
Irsyad Adam, Zekai Chen, David Laprade, Shaun Porwal, David Laub, Erik Reinertsen, Arda Pekis, Kevin Brown

TL;DR
This paper introduces SMB-Structure, a world model for structured EHR data that simulates patient trajectories over time, outperforming traditional autoregressive models in capturing disease dynamics across large clinical cohorts.
Contribution
The paper presents a novel training paradigm combining joint-embedding prediction with next-token prediction to model patient trajectories as dynamical systems, not just documents.
Findings
Captures disease dynamics not recoverable by autoregressive models
Achieves competitive performance on complex, heterogeneous clinical tasks
Validated on large-scale oncology and pulmonary embolism cohorts
Abstract
Large language models (LLMs) trained with next-word-prediction have achieved success as clinical foundation models. Representations from these language backbones yield strong linear probe performance across biomedical tasks, suggesting that patient semantics emerge from next-token prediction at scale. However, this paradigm treats patients as a document to be summarized rather than a dynamical system to be simulated; a patient's trajectory emerges from their state evolving under interventions and time, requiring models that simulate dynamics rather than predict tokens. To address this, we introduce SMB-Structure, a world model for structured EHR that grounds a joint-embedding prediction architecture (JEPA) with next-token prediction (SFT). SFT grounds our model to reconstruct future patient states in token space, while JEPA predicts those futures in latent space from the initial patient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Artificial Intelligence in Healthcare and Education · Topic Modeling
