Revisiting the Primacy of English in Zero-shot Cross-lingual Transfer
Iulia Turc, Kenton Lee, Jacob Eisenstein, Ming-Wei Chang, Kristina, Toutanova

TL;DR
This paper challenges the dominance of English as the primary transfer language in zero-shot cross-lingual tasks, showing that other high-resource languages can often transfer more effectively across diverse target languages.
Contribution
It systematically compares English with other languages for transfer, revealing that languages like German and Russian can outperform English in zero-shot settings.
Findings
German and Russian often transfer more effectively than English.
Translation-based training sets can favor non-English transfer languages.
Implications for designing multilingual benchmarks and systems.
Abstract
Despite their success, large pre-trained multilingual models have not completely alleviated the need for labeled data, which is cumbersome to collect for all target languages. Zero-shot cross-lingual transfer is emerging as a practical solution: pre-trained models later fine-tuned on one transfer language exhibit surprising performance when tested on many target languages. English is the dominant source language for transfer, as reinforced by popular zero-shot benchmarks. However, this default choice has not been systematically vetted. In our study, we compare English against other transfer languages for fine-tuning, on two pre-trained multilingual models (mBERT and mT5) and multiple classification and question answering tasks. We find that other high-resource languages such as German and Russian often transfer more effectively, especially when the set of target languages is diverse or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
