Clin-JEPA: A Multi-Phase Co-Training Framework for Joint-Embedding Predictive Pretraining on EHR Patient Trajectories

Yixuan Yang; Mehak Arora; Ryan Zhang; Baraa Abed; Junseob Kim; Tilendra Choudhary; Md Hassanuzzaman; Kevin Zhu; Ayman Ali; Chengkun Yang; Alasdair Edward Gent; Victor Moas; and Rishikesan Kamaleswaran

arXiv:2605.10840·cs.LG·May 13, 2026

Clin-JEPA: A Multi-Phase Co-Training Framework for Joint-Embedding Predictive Pretraining on EHR Patient Trajectories

Yixuan Yang, Mehak Arora, Ryan Zhang, Baraa Abed, Junseob Kim, Tilendra Choudhary, Md Hassanuzzaman, Kevin Zhu, Ayman Ali, Chengkun Yang, Alasdair Edward Gent, Victor Moas, and Rishikesan Kamaleswaran

PDF

TL;DR

Clin-JEPA introduces a stable multi-phase co-training framework for joint-embedding predictive pretraining on EHR data, enabling a single model to forecast patient trajectories and perform diverse risk predictions without task-specific fine-tuning.

Contribution

The paper develops a novel five-phase curriculum for stable co-training of encoder and predictor, addressing prior instability issues in joint-embedding pretraining on EHR data.

Findings

01

Latent rollout drift converges (-15.7%) over 48 hours, unlike baselines that diverge (+3% to +4951%).

02

Encoder learns clinically meaningful latent geometry, separating patient cohorts effectively.

03

Single backbone model outperforms baselines on multi-task risk prediction, with AUROC up to 0.883.

Abstract

We present Clin-JEPA, a multi-phase co-training framework for joint-embedding predictive (JEPA) pretraining on EHR patient trajectories. JEPA architectures have enabled latent-space planning in robotics and high-quality representation learning in vision, but extending the paradigm to EHR data -- to obtain a single backbone that simultaneously forecasts patient trajectories and serves diverse downstream risk-prediction tasks without per-task fine-tuning -- remains an open challenge. Existing JEPA frameworks either discard the predictor after pretraining (I-JEPA, V-JEPA) or train it on a frozen pretrained encoder (V-JEPA 2-AC), leaving the encoder unaware of the rollout signal that the retained predictor must use at inference; co-training the encoder and predictor under a shared JEPA prediction objective would supply this grounding, but na\"ive co-training is unstable, with representation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.