JEPA-DNA: Grounding Genomic Foundation Models through Joint-Embedding Predictive Architectures
Ariel Larey, Elay Dahan, Amit Bleiweiss, Raizy Kellerman, Guy Leib, Omri Nayshool, Dan Ofer, Tal Zinger, Dan Dominissini, Gideon Rechavi, Nicole Bussola, Simon Lee, Shane O'Connell, Dung Hoang, Marissa Wirth, Alexander W. Charney, Nati Daniel, Yoli Shavit

TL;DR
JEPA-DNA introduces a novel pre-training framework for genomic models that combines joint-embedding predictive architecture with traditional objectives, enhancing the understanding of functional genomic context beyond local motifs.
Contribution
It integrates latent grounding with token recovery and predictive objectives, extending existing paradigms to improve functional representation in genomic foundation models.
Findings
Outperforms generative-only models on diverse genomic benchmarks.
Provides more biologically grounded and robust representations.
Enhances zero-shot and supervised task performance.
Abstract
Genomic Foundation Models (GFMs) have largely relied on Masked Language Modeling (MLM) or Next Token Prediction (NTP) to learn the language of life. While these paradigms excel at capturing local genomic syntax and fine-grained motif patterns, they often fail to capture the broader functional context, resulting in representations that lack a global biological perspective. We introduce JEPA-DNA, a novel pre-training framework that integrates the Joint-Embedding Predictive Architecture (JEPA) with traditional generative objectives. JEPA-DNA introduces latent grounding by coupling token-level recovery with a predictive objective in the latent space by supervising a CLS token. This forces the model to predict the high-level functional embeddings of masked genomic segments rather than focusing solely on individual nucleotides. JEPA-DNA extends both NTP and MLM paradigms and can be deployed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Chromatin Dynamics · Generative Adversarial Networks and Image Synthesis · Machine Learning in Bioinformatics
