Copy First, Translate Later: Interpreting Translation Dynamics in Multilingual Pretraining
Felicia K\"orner, Maria Matveev, Florian Eichin, Gitta Kutyniok, Barbara Plank, Michael A. Hedderich

TL;DR
This paper investigates how multilingual language models develop translation abilities during early training, revealing a two-phase process involving copying and the emergence of general translation mechanisms.
Contribution
It introduces a new dataset and fine-grained analysis methods to study the evolution of translation capabilities in multilingual models during pretraining.
Findings
Models quickly learn basic linguistic skills alongside copying.
Translation develops in two phases: initial copying and later generalization.
Translation mechanisms emerge after initial surface-level similarities.
Abstract
Large language models exhibit impressive cross-lingual capabilities. However, prior work analyzes this phenomenon through isolated factors and at sparse points during training, limiting our understanding of how cross-lingual generalization emerges--particularly in the early phases of learning. To study the early trajectory of linguistic and translation capabilities, we pretrain a multilingual 1.7B model on nine diverse languages, capturing checkpoints at a much finer granularity. We further introduce a novel word-level translation dataset and trace how translation develops over training through behavioral analyses, model-component analysis, and parameter-based ablations. We find that the model quickly acquires basic linguistic capabilities in parallel with token-level copying, while translation develops in two distinct phases: an initial phase dominated by copying and surface-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
