On the limited utility of parallel data for learning shared multilingual representations
Julius Leino, J\"org Tiedemann

TL;DR
This paper investigates the role of parallel data in pretraining shared multilingual models and finds it has minimal impact on cross-lingual alignment, which emerges naturally without explicit signals.
Contribution
The study demonstrates that parallel data has limited effect on cross-lingual alignment, challenging assumptions about its necessity in multilingual pretraining.
Findings
Parallel data slightly accelerates early representation sharing.
Parallel data reduces language-specific neurons.
Cross-lingual alignment occurs without explicit parallel data.
Abstract
Shared multilingual representations are essential for cross-lingual tasks and knowledge transfer across languages. This study looks at the impact of parallel data, i.e. translated sentences, in pretraining as a signal to trigger representations that are aligned across languages. We train reference models with different proportions of parallel data and show that parallel data seem to have only a minimal effect on the cross-lingual alignment. Based on multiple evaluation methods, we find that the effect is limited to potentially accelerating the representation sharing in the early phases of pretraining, and to decreasing the amount of language-specific neurons in the model. Cross-lingual alignment seems to emerge on similar levels even without the explicit signal from parallel data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
