LL\"aMmlein: Transparent, Compact and Competitive German-Only Language Models from Scratch
Jan Pfister, Julia Wunderle, Andreas Hotho

TL;DR
This paper introduces two German-only decoder models, LLäMmlein 120M and 1B, created from scratch with transparent training processes, and demonstrates their competitive performance on benchmarks, providing valuable insights for future NLP model development.
Contribution
The paper presents the creation and open release of German-only language models from scratch, including training data, custom tokenizer, and evaluation, with insights into their learning dynamics and resource efficiency.
Findings
Models perform competitively on benchmarks.
Model quality scales with size, but gains plateau early.
Training process and checkpoints provide insights into learning dynamics.
Abstract
We create two German-only decoder models, LL\"aMmlein 120M and 1B, transparently from scratch and publish them, along with the training data, for the German NLP research community to use. The model training involved several key steps, including extensive data preprocessing, the creation of a custom German tokenizer, the training itself, as well as the evaluation of the final models on various benchmarks. Throughout the training process, multiple checkpoints were saved and analyzed using the SuperGLEBer benchmark to monitor the models' learning dynamics. Compared to state-of-the-art models on the SuperGLEBer benchmark, both LL\"aMmlein models performed competitively, consistently matching or surpassing models with similar parameter sizes. The results show that the models' quality scales with size as expected, but performance improvements on some tasks plateaued early, offering valuable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗LSX-UniWue/LLaMmlein_1Bmodel· 856 dl· ♡ 2856 dl♡ 2
- 🤗LSX-UniWue/LLaMmlein_120M_prereleasemodel· 4 dl· ♡ 44 dl♡ 4
- 🤗LSX-UniWue/LLaMmlein_1B_prereleasemodel· 59 dl· ♡ 1459 dl♡ 14
- 🤗LSX-UniWue/Betzerl_1B_wiki_previewmodel· 3 dl3 dl
- 🤗LSX-UniWue/LLaMmlein_1B_chat_guanakomodel· 1 dl1 dl
- 🤗LSX-UniWue/LLaMmlein_1B_chat_sharegptmodel· 3 dl· ♡ 13 dl♡ 1
- 🤗LSX-UniWue/LLaMmlein_1B_chat_alpacamodel· 2 dl· ♡ 12 dl♡ 1
- 🤗LSX-UniWue/LLaMmlein_1B_chat_evol_instructmodel· ♡ 1♡ 1
- 🤗LSX-UniWue/LLaMmlein_1B_chat_allmodel· 4 dl4 dl
- 🤗LSX-UniWue/LLaMmlein_1B_chat_selectedmodel· 15 dl· ♡ 115 dl♡ 1
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
