Grow Up and Merge: Scaling Strategies for Efficient Language Adaptation

Kevin Glocker; K\"atriin Kukk; Romina Oji; Marcel Bollmann; Marco Kuhlmann; Jenny Kunz

arXiv:2512.10772·cs.CL·December 12, 2025

Grow Up and Merge: Scaling Strategies for Efficient Language Adaptation

Kevin Glocker, K\"atriin Kukk, Romina Oji, Marcel Bollmann, Marco Kuhlmann, Jenny Kunz

PDF

Open Access 10 Models 2 Datasets

TL;DR

This paper demonstrates that scaling up pretrained language models enhances their ability to adapt to low-resource languages efficiently, preserves English capabilities, and offers insights into merging language-specific models for multilingual systems.

Contribution

The study shows that larger, upscaled models enable more effective language adaptation and merging, providing a scalable strategy for multilingual model development.

Findings

01

Upscaled models match or outperform smaller models with more data.

02

Scaling preserves English performance and reduces catastrophic forgetting.

03

Merging larger models yields better results than smaller ones, with room for improved methods.

Abstract

Achieving high-performing language models which include medium- and lower-resource languages remains a challenge. Massively multilingual models still underperform compared to language-specific adaptations, especially at smaller model scales. In this work, we investigate scaling as an efficient strategy for adapting pretrained models to new target languages. Through comprehensive scaling ablations with approximately FLOP-matched models, we test whether upscaling an English base model enables more effective and resource-efficient adaptation than standard continued pretraining. We find that, once exposed to sufficient target-language data, larger upscaled models can match or surpass the performance of smaller models continually pretrained on much more data, demonstrating the benefits of scaling for data efficiency. Scaling also helps preserve the base model's capabilities in English, thus…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsICT in Developing Communities · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications