New Encoders for German Trained from Scratch: Comparing ModernGBERT with Converted LLM2Vec Models
Julia Wunderle, Anton Ehrmanntraut, Jan Pfister, Fotis Jannidis, Andreas Hotho

TL;DR
This paper compares training German encoders from scratch versus converting decoders, introducing ModernGBERT and LL"aMmleinVec, and finds that from-scratch models excel in efficiency and latency, while conversion is viable with limited compute.
Contribution
The study introduces two German encoder resources, ModernGBERT and LL"aMmleinVec, and provides a comprehensive comparison of training from scratch versus converting decoders, offering practical guidance.
Findings
ModernGBERT 1B achieves state-of-the-art results (0.808)
Converted 7B model performs slightly better after fine-tuning (0.557 vs 0.551)
From-scratch encoders outperform conversion when efficiency and latency are priorities.
Abstract
Encoders remain essential for efficient German NLP and NLU scenarios despite the rise of decoder-only LLMs. This work studies two routes to high-quality German encoders under identical data and training constraints: 1) training from scratch and 2) converting decoders via LLM2Vec. We introduce two resources: ModernGBERT (134M, 1B), fully transparent German encoders in the ModernBERT style, and LL\"aMmleinVec (120M, 1B, 7B), decoder-to-encoder conversions trained with masked next-token prediction, both undergoing a context extension to 8.192 tokens. Across SuperGLEBer, ModernGBERT 1B sets a new state of the art (avg 0.808), surpassing GBERT Large (+4%) and the seven-times larger converted 7B model (0.787). On German MTEB after supervised fine-tuning, ModernGBERT 1B (0.551) approaches the converted 7B model (0.557). We release all models, checkpoints, datasets, and full training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Adversarial Robustness in Machine Learning · Topic Modeling
