TL;DR
This paper introduces AfriqueLLM, a suite of open multilingual models for 20 African languages, demonstrating how data composition and architecture choices influence performance after continued pre-training.
Contribution
It systematically analyzes the impact of data mixtures and model architectures on CPT outcomes for African languages, providing empirical insights and releasing the models publicly.
Findings
Data composition is the main factor driving CPT improvements.
Adding math, code, and synthetic data enhances reasoning abilities.
Model architecture choices often outweigh scale in performance outcomes.
Abstract
Large language models (LLMs) are increasingly multilingual, yet open models continue to underperform relative to proprietary systems, with the gap most pronounced for African languages. Continued pre-training (CPT) offers a practical route to language adaptation, but improvements on demanding capabilities such as mathematical reasoning often remain limited. This limitation is driven in part by the uneven domain coverage and missing task-relevant knowledge that characterize many low-resource language corpora. We present \texttt{AfriqueLLM}, a suite of open LLMs adapted to 20 African languages through CPT on 26B tokens. We perform a comprehensive empirical study across five base models spanning sizes and architectures, including Llama 3.1, Gemma 3, and Qwen 3, and systematically analyze how CPT data composition shapes downstream performance. In particular, we vary mixtures that include…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗McGill-NLP/AfriqueGemma-4Bmodel· 679 dl679 dl
- 🤗McGill-NLP/AfriqueGemma-12Bmodel· 591 dl· ♡ 1591 dl♡ 1
- 🤗McGill-NLP/AfriqueLlama-8Bmodel· 686 dl686 dl
- 🤗McGill-NLP/AfriqueQwen-8Bmodel· 1.2k dl· ♡ 21.2k dl♡ 2
- 🤗McGill-NLP/AfriqueQwen-14Bmodel· 1.8k dl· ♡ 31.8k dl♡ 3
- 🤗McGill-NLP/AfriqueQwen-4Bmodel· 22 dl22 dl
- 🤗McGill-NLP/AfriqueQwen3.5-4Bmodel· 111 dl111 dl
- 🤗McGill-NLP/AfriqueQwen3.5-4B-ExtendedCMmodel· 58 dl58 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
