Adapting Pre-trained Language Models to African Languages via Multilingual Adaptive Fine-Tuning
Jesujoba O. Alabi, David Ifeoluwa Adelani, Marius Mosbach, Dietrich, Klakow

TL;DR
This paper introduces a multilingual adaptive fine-tuning method for African languages that enhances cross-lingual transfer and reduces model size, outperforming individual language adaptation on multiple NLP tasks.
Contribution
The paper proposes a multilingual adaptive fine-tuning approach for African languages that improves transfer learning and reduces model size compared to language-specific fine-tuning.
Findings
Multilingual adaptive fine-tuning outperforms individual language fine-tuning.
Model size is reduced by approximately 50% through vocabulary token removal.
Improved zero-shot cross-lingual transfer abilities for NLP tasks.
Abstract
Multilingual pre-trained language models (PLMs) have demonstrated impressive performance on several downstream tasks for both high-resourced and low-resourced languages. However, there is still a large performance drop for languages unseen during pre-training, especially African languages. One of the most effective approaches to adapt to a new language is \textit{language adaptive fine-tuning} (LAFT) -- fine-tuning a multilingual PLM on monolingual texts of a language using the pre-training objective. However, adapting to a target language individually takes a large disk space and limits the cross-lingual transfer abilities of the resulting models because they have been specialized for a single language. In this paper, we perform \textit{multilingual adaptive fine-tuning} on 17 most-resourced African languages and three other high-resource languages widely spoken on the African…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
