ChocoLlama: Lessons Learned From Teaching Llamas Dutch
Matthieu Meeus, Anthony Rath\'e, Fran\c{c}ois Remy, Pieter Delobelle,, Jens-Joris Decorte, Thomas Demeester

TL;DR
This paper investigates methods for adapting English-centric LLMs, specifically Llama-2 and Llama-3, to Dutch by collecting Dutch data, applying continued pretraining with LoRA, and experimenting with tokenizers, revealing insights into effective language adaptation strategies.
Contribution
The study compares adaptation techniques for Llama models to Dutch, highlighting the effectiveness of LoRA and tokenizer modifications, and provides a new Dutch benchmark for LLM evaluation.
Findings
LoRA effectively scales for language adaptation
Tokenizer modification with reinitialization improves performance
Llama-3 outperforms adapted Llama-2 in Dutch capabilities
Abstract
While Large Language Models (LLMs) have shown remarkable capabilities in natural language understanding and generation, their performance often lags in lower-resource, non-English languages due to biases in the training data. In this work, we explore strategies for adapting the primarily English LLMs (Llama-2 and Llama-3) to Dutch, a language spoken by 30 million people worldwide yet often underrepresented in LLM development. We collect 104GB of Dutch text (B tokens) from various sources to first apply continued pretraining using low-rank adaptation (LoRA), complemented with Dutch posttraining strategies provided by prior work. For Llama-2, we consider using (i) the tokenizer of the original model, and (ii) training a new, Dutch-specific tokenizer combined with embedding reinitialization. We evaluate our adapted models, ChocoLlama-2, both on standard benchmarks and a novel Dutch…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗ChocoLlama/ChocoLlama-2-7B-tokentrans-basemodel· 5 dl5 dl
- 🤗ChocoLlama/ChocoLlama-2-7B-basemodel· 8 dl· ♡ 28 dl♡ 2
- 🤗ChocoLlama/ChocoLlama-2-7B-instructmodel· 8 dl· ♡ 28 dl♡ 2
- 🤗ChocoLlama/ChocoLlama-2-7B-tokentrans-instructmodel· 14 dl· ♡ 114 dl♡ 1
- 🤗ChocoLlama/Llama-3-ChocoLlama-8B-basemodel· 8 dl· ♡ 18 dl♡ 1
- 🤗ChocoLlama/Llama-3-ChocoLlama-8B-instructmodel· 11 dl· ♡ 611 dl♡ 6
- 🤗RichardErkhov/ChocoLlama_-_ChocoLlama-2-7B-instruct-8bitsmodel· 2 dl2 dl
- 🤗RichardErkhov/ChocoLlama_-_ChocoLlama-2-7B-base-8bitsmodel· 3 dl3 dl
- 🤗RichardErkhov/ChocoLlama_-_Llama-3-ChocoLlama-8B-instruct-8bitsmodel
- 🤗RichardErkhov/ChocoLlama_-_Llama-3-ChocoLlama-8B-instruct-awqmodel
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPlant and Fungal Species Descriptions
