BERT-JEPA: Reorganizing CLS Embeddings for Language-Invariant Semantics

Taj Gillin; Adam Lalani; Kenneth Zhang; Marcel Mateos Salles

arXiv:2601.00366·cs.CL·January 5, 2026

BERT-JEPA: Reorganizing CLS Embeddings for Language-Invariant Semantics

Taj Gillin, Adam Lalani, Kenneth Zhang, Marcel Mateos Salles

PDF

Open Access

TL;DR

BERT-JEPA enhances BERT models with a JEPA-based training objective to create language-invariant embeddings, improving multilingual performance by reorganizing CLS embeddings into a language-agnostic space.

Contribution

This paper introduces BERT-JEPA, a novel training paradigm that integrates JEPA with BERT to produce language-invariant CLS embeddings, addressing collapse issues.

Findings

01

Improved multilingual benchmark performance

02

Reorganized CLS embeddings into language-agnostic space

03

Demonstrated effectiveness of JEPA in language models

Abstract

Joint Embedding Predictive Architectures (JEPA) are a novel self supervised training technique that have shown recent promise across domains. We introduce BERT-JEPA (BEPA), a training paradigm that adds a JEPA training objective to BERT-style models, working to combat a collapsed [CLS] embedding space and turning it into a language-agnostic space. This new structure leads to increased performance across multilingual benchmarks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks · Machine Learning in Healthcare