When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer
Ameet Deshpande, Partha Talukdar, Karthik Narasimhan

TL;DR
This paper investigates which linguistic properties influence cross-lingual transfer in multilingual models, highlighting the importance of word embedding alignment and the impact of script, word order, and syntax differences.
Contribution
It provides a large-scale empirical analysis isolating key linguistic factors affecting zero-shot transfer, emphasizing the role of explicit word embedding alignment.
Findings
Absence of sub-word overlap impairs transfer with different word order.
Strong correlation (R=0.94) between transfer performance and word embedding alignment.
Focusing on explicit embedding alignment improves multilingual transfer.
Abstract
While recent work on multilingual language models has demonstrated their capacity for cross-lingual zero-shot transfer on downstream tasks, there is a lack of consensus in the community as to what shared properties between languages enable such transfer. Analyses involving pairs of natural languages are often inconclusive and contradictory since languages simultaneously differ in many linguistic aspects. In this paper, we perform a large-scale empirical study to isolate the effects of various linguistic properties by measuring zero-shot transfer between four diverse natural languages and their counterparts constructed by modifying aspects such as the script, word order, and syntax. Among other things, our experiments show that the absence of sub-word overlap significantly affects zero-shot transfer when languages differ in their word order, and there is a strong correlation between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
