Grow Up and Merge: Scaling Strategies for Efficient Language Adaptation
Kevin Glocker, K\"atriin Kukk, Romina Oji, Marcel Bollmann, Marco Kuhlmann, Jenny Kunz

TL;DR
This paper demonstrates that scaling up pretrained language models enhances their ability to adapt to low-resource languages efficiently, preserves English capabilities, and offers insights into merging language-specific models for multilingual systems.
Contribution
The study shows that larger, upscaled models enable more effective language adaptation and merging, providing a scalable strategy for multilingual model development.
Findings
Upscaled models match or outperform smaller models with more data.
Scaling preserves English performance and reduces catastrophic forgetting.
Merging larger models yields better results than smaller ones, with room for improved methods.
Abstract
Achieving high-performing language models which include medium- and lower-resource languages remains a challenge. Massively multilingual models still underperform compared to language-specific adaptations, especially at smaller model scales. In this work, we investigate scaling as an efficient strategy for adapting pretrained models to new target languages. Through comprehensive scaling ablations with approximately FLOP-matched models, we test whether upscaling an English base model enables more effective and resource-efficient adaptation than standard continued pretraining. We find that, once exposed to sufficient target-language data, larger upscaled models can match or surpass the performance of smaller models continually pretrained on much more data, demonstrating the benefits of scaling for data efficiency. Scaling also helps preserve the base model's capabilities in English, thus…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗liu-nlp/hyperllama-180m-english-1xmodel· 23 dl23 dl
- 🤗liu-nlp/hyperllama-180m-estonian-1xmodel· 5 dl5 dl
- 🤗liu-nlp/hyperllama-180m-faroese-1xmodel· 3 dl3 dl
- 🤗liu-nlp/hyperllama-180m-icelandic-1xmodel· 4 dl4 dl
- 🤗liu-nlp/hyperllama-180m-multilingual-1xmodel· 7 dl· ♡ 17 dl♡ 1
- 🤗liu-nlp/hyperllama-180m-persian-1xmodel· 32 dl32 dl
- 🤗liu-nlp/hyperllama-180m-swedish-1xmodel· 4 dl· ♡ 14 dl♡ 1
- 🤗liu-nlp/hyperllama-572m-english-1x-clonedmodel· 24 dl24 dl
- 🤗liu-nlp/hyperllama-572m-estonian-1x-clonedmodel· 10 dl10 dl
- 🤗liu-nlp/hyperllama-572m-estonian-1x-cloned-matching-1xmodel· 10 dl10 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsICT in Developing Communities · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
