On the limited utility of parallel data for learning shared multilingual representations

Julius Leino; J\"org Tiedemann

arXiv:2603.29026·cs.CL·April 1, 2026

On the limited utility of parallel data for learning shared multilingual representations

Julius Leino, J\"org Tiedemann

PDF

TL;DR

This paper investigates the role of parallel data in pretraining shared multilingual models and finds it has minimal impact on cross-lingual alignment, which emerges naturally without explicit signals.

Contribution

The study demonstrates that parallel data has limited effect on cross-lingual alignment, challenging assumptions about its necessity in multilingual pretraining.

Findings

01

Parallel data slightly accelerates early representation sharing.

02

Parallel data reduces language-specific neurons.

03

Cross-lingual alignment occurs without explicit parallel data.

Abstract

Shared multilingual representations are essential for cross-lingual tasks and knowledge transfer across languages. This study looks at the impact of parallel data, i.e. translated sentences, in pretraining as a signal to trigger representations that are aligned across languages. We train reference models with different proportions of parallel data and show that parallel data seem to have only a minimal effect on the cross-lingual alignment. Based on multiple evaluation methods, we find that the effect is limited to potentially accelerating the representation sharing in the early phases of pretraining, and to decreasing the amount of language-specific neurons in the model. Cross-lingual alignment seems to emerge on similar levels even without the explicit signal from parallel data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.