BERT-JEPA: Reorganizing CLS Embeddings for Language-Invariant Semantics
Taj Gillin, Adam Lalani, Kenneth Zhang, Marcel Mateos Salles

TL;DR
BERT-JEPA enhances BERT models with a JEPA-based training objective to create language-invariant embeddings, improving multilingual performance by reorganizing CLS embeddings into a language-agnostic space.
Contribution
This paper introduces BERT-JEPA, a novel training paradigm that integrates JEPA with BERT to produce language-invariant CLS embeddings, addressing collapse issues.
Findings
Improved multilingual benchmark performance
Reorganized CLS embeddings into language-agnostic space
Demonstrated effectiveness of JEPA in language models
Abstract
Joint Embedding Predictive Architectures (JEPA) are a novel self supervised training technique that have shown recent promise across domains. We introduce BERT-JEPA (BEPA), a training paradigm that adds a JEPA training objective to BERT-style models, working to combat a collapsed [CLS] embedding space and turning it into a language-agnostic space. This new structure leads to increased performance across multilingual benchmarks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Graph Neural Networks · Machine Learning in Healthcare
