Emergent Abilities of Large Language Models under Continued Pretraining for Language Adaptation
Ahmed Elhady, Eneko Agirre, Mikel Artetxe

TL;DR
This paper investigates how continued pretraining of large language models with English data influences their ability to adapt to new languages, revealing that English inclusion is crucial for emergent capabilities and proposing methods to mitigate its necessity.
Contribution
It uncovers the role of English data in emergent abilities during language adaptation and introduces curriculum learning and EMA as effective alternatives to English inclusion.
Findings
English data inclusion is critical for downstream capabilities.
Catastrophic forgetting occurs early without English data.
Curriculum learning and EMA mitigate English data dependence.
Abstract
Continued pretraining (CPT) is a popular approach to adapt existing large language models (LLMs) to new languages. When doing so, it is common practice to include a portion of English data in the mixture, but its role has not been carefully studied to date. In this work, we show that including English does not impact validation perplexity, yet it is critical for the emergence of downstream capabilities in the target language. We introduce a language-agnostic benchmark for in-context learning (ICL), which reveals catastrophic forgetting early on CPT when English is not included. This in turn damages the ability of the model to generalize to downstream prompts in the target language as measured by perplexity, even if it does not manifest in terms of accuracy until later in training, and can be tied to a big shift in the model parameters. Based on these insights, we introduce curriculum…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsText Readability and Simplification · Natural Language Processing Techniques · Multimodal Machine Learning Applications
