SnakModel: Lessons Learned from Training an Open Danish Large Language Model
Mike Zhang, Max M\"uller-Eberstein, Elisa Bassignana, Rob van, der Goot

TL;DR
SnakModel is a Danish large language model built on Llama2-7B, trained on a curated Danish corpus, and evaluated across multiple tasks, providing insights and best practices for developing LLMs in smaller language communities.
Contribution
The paper introduces SnakModel, a Danish LLM, and systematically analyzes training decisions and their impact, establishing guidelines for resource-constrained language model development.
Findings
SnakModel outperforms other Llama2-7B-based models on Danish tasks.
Training decisions significantly affect downstream performance.
Open-sourcing the model and data promotes further research.
Abstract
We present SnakModel, a Danish large language model (LLM) based on Llama2-7B, which we continuously pre-train on 13.6B Danish words, and further tune on 3.7M Danish instructions. As best practices for creating LLMs for smaller language communities have yet to be established, we examine the effects of early modeling and training decisions on downstream performance throughout the entire training pipeline, including (1) the creation of a strictly curated corpus of Danish text from diverse sources; (2) the language modeling and instruction-tuning training process itself, including the analysis of intermediate training dynamics, and ablations across different hyperparameters; (3) an evaluation on eight language and culturally-specific tasks. Across these experiments SnakModel achieves the highest overall performance, outperforming multiple contemporary Llama2-7B-based models. By making…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies
