Copy First, Translate Later: Interpreting Translation Dynamics in Multilingual Pretraining

Felicia K\"orner; Maria Matveev; Florian Eichin; Gitta Kutyniok; Barbara Plank; Michael A. Hedderich

arXiv:2604.17633·cs.CL·April 21, 2026

Copy First, Translate Later: Interpreting Translation Dynamics in Multilingual Pretraining

Felicia K\"orner, Maria Matveev, Florian Eichin, Gitta Kutyniok, Barbara Plank, Michael A. Hedderich

PDF

TL;DR

This paper investigates how multilingual language models develop translation abilities during early training, revealing a two-phase process involving copying and the emergence of general translation mechanisms.

Contribution

It introduces a new dataset and fine-grained analysis methods to study the evolution of translation capabilities in multilingual models during pretraining.

Findings

01

Models quickly learn basic linguistic skills alongside copying.

02

Translation develops in two phases: initial copying and later generalization.

03

Translation mechanisms emerge after initial surface-level similarities.

Abstract

Large language models exhibit impressive cross-lingual capabilities. However, prior work analyzes this phenomenon through isolated factors and at sparse points during training, limiting our understanding of how cross-lingual generalization emerges--particularly in the early phases of learning. To study the early trajectory of linguistic and translation capabilities, we pretrain a multilingual 1.7B model on nine diverse languages, capturing checkpoints at a much finer granularity. We further introduce a novel word-level translation dataset and trace how translation develops over training through behavioral analyses, model-component analysis, and parameter-based ablations. We find that the model quickly acquires basic linguistic capabilities in parallel with token-level copying, while translation develops in two distinct phases: an initial phase dominated by copying and surface-level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.