Atlas-Chat: Adapting Large Language Models for Low-Resource Moroccan Arabic Dialect
Guokan Shang, Hadi Abdine, Yousef Khoubrane, Amr Mohamed, Yassine, Abbahaddou, Sofiane Ennadir, Imane Momayiz, Xuguang Ren, Eric Moulines,, Preslav Nakov, Michalis Vazirgiannis, Eric Xing

TL;DR
Atlas-Chat introduces specialized large language models for Moroccan Arabic dialect, demonstrating superior performance in instruction-following and NLP tasks through novel datasets and fine-tuning strategies, filling a gap in low-resource language modeling.
Contribution
The paper presents the first LLMs tailored for Moroccan Arabic, with new datasets, training methods, and evaluation benchmarks for low-resource dialects.
Findings
Models outperform state-of-the-art Arabic LLMs
9B model improves 13% over larger models on DarijaMMLU
Fine-tuning strategies significantly impact performance
Abstract
We introduce Atlas-Chat, the first-ever collection of LLMs specifically developed for dialectal Arabic. Focusing on Moroccan Arabic, also known as Darija, we construct our instruction dataset by consolidating existing Darija language resources, creating novel datasets both manually and synthetically, and translating English instructions with stringent quality control. Atlas-Chat-2B, 9B, and 27B models, fine-tuned on the dataset, exhibit superior ability in following Darija instructions and performing standard NLP tasks. Notably, our models outperform both state-of-the-art and Arabic-specialized LLMs like LLaMa, Jais, and AceGPT, e.g., our 9B model gains a 13% performance boost over a larger 13B model on DarijaMMLU, in our newly introduced evaluation suite for Darija covering both discriminative and generative tasks. Furthermore, we perform an experimental analysis of various fine-tuning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗MBZUAI-Paris/Atlas-Chat-9Bmodel· 843 dl· ♡ 28843 dl♡ 28
- 🤗MBZUAI-Paris/Atlas-Chat-2Bmodel· 369 dl· ♡ 26369 dl♡ 26
- 🤗MBZUAI-Paris/Atlas-Chat-27Bmodel· 73 dl· ♡ 1673 dl♡ 16
- 🤗QuantFactory/Atlas-Chat-9B-GGUFmodel· 352 dl· ♡ 2352 dl♡ 2
- 🤗QuantFactory/Atlas-Chat-2B-GGUFmodel· 36 dl· ♡ 136 dl♡ 1
- 🤗RichardErkhov/MBZUAI-Paris_-_Atlas-Chat-2B-ggufmodel· 220 dl220 dl
- 🤗RichardErkhov/MBZUAI-Paris_-_Atlas-Chat-9B-ggufmodel· 169 dl· ♡ 1169 dl♡ 1
- 🤗RichardErkhov/MBZUAI-Paris_-_Atlas-Chat-27B-ggufmodel· 69 dl69 dl
- 🤗RichardErkhov/MBZUAI-Paris_-_Atlas-Chat-2B-4bitsmodel· 3 dl3 dl
- 🤗RichardErkhov/MBZUAI-Paris_-_Atlas-Chat-2B-8bitsmodel· 2 dl2 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage, Linguistics, Cultural Analysis · Natural Language Processing Techniques · Linguistic Studies and Language Acquisition
MethodsBalanced Selection
