ProteinJEPA: Latent prediction complements protein language models

Dan Ofer; Dafna Shahaf; Michal Linial

arXiv:2605.07554·cs.LG·May 11, 2026

ProteinJEPA: Latent prediction complements protein language models

Dan Ofer, Dafna Shahaf, Michal Linial

PDF

TL;DR

This paper introduces a combined masked-position MLM+JEPA training method for protein language models, which improves performance on several downstream tasks compared to MLM-only training.

Contribution

The study demonstrates that integrating latent-space prediction at masked positions with traditional MLM enhances protein model performance.

Findings

01

Masked-position MLM+JEPA outperforms MLM-only on 11 of 16 tasks.

02

Gains observed in stability, enzyme classification, and fold retrieval.

03

JEPA-only collapses; combined approach is competitive.

Abstract

Protein language models are trained primarily with masked language modeling (MLM), which predicts amino-acid identities at masked positions. We ask whether latent-space prediction can complement these token-level objectives under matched wall-clock budget. Across pretrained and random-init protein sequence encoders at 35--150M parameters, we find that the best protein-JEPA design is not all-position latent prediction but a variant: predicting latent targets only at masked positions, and retaining the MLM cross-entropy. We call this recipe masked-position MLM+JEPA. On a 16-task downstream suite (15 frozen linear probes plus SCOPe-40 zero-shot fold retrieval), under matched wall-clock budgets, this recipe wins more tasks than it loses against MLM-only continuation: 10 wins / 3 losses / 3 ties (hereafter W/L/T) on pretrained ESM2-35M, 11/2/3 on ESM2-150M while results in pretraining from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.