ProteinJEPA: Latent prediction complements protein language models
Dan Ofer, Dafna Shahaf, Michal Linial

TL;DR
This paper introduces a combined masked-position MLM+JEPA training method for protein language models, which improves performance on several downstream tasks compared to MLM-only training.
Contribution
The study demonstrates that integrating latent-space prediction at masked positions with traditional MLM enhances protein model performance.
Findings
Masked-position MLM+JEPA outperforms MLM-only on 11 of 16 tasks.
Gains observed in stability, enzyme classification, and fold retrieval.
JEPA-only collapses; combined approach is competitive.
Abstract
Protein language models are trained primarily with masked language modeling (MLM), which predicts amino-acid identities at masked positions. We ask whether latent-space prediction can complement these token-level objectives under matched wall-clock budget. Across pretrained and random-init protein sequence encoders at 35--150M parameters, we find that the best protein-JEPA design is not all-position latent prediction but a variant: predicting latent targets only at masked positions, and retaining the MLM cross-entropy. We call this recipe masked-position MLM+JEPA. On a 16-task downstream suite (15 frozen linear probes plus SCOPe-40 zero-shot fold retrieval), under matched wall-clock budgets, this recipe wins more tasks than it loses against MLM-only continuation: 10 wins / 3 losses / 3 ties (hereafter W/L/T) on pretrained ESM2-35M, 11/2/3 on ESM2-150M while results in pretraining from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
