Cross-Lingual Ability of Multilingual BERT: An Empirical Study
Karthikeyan K, Zihan Wang, Stephen Mayhew, Dan Roth

TL;DR
This study investigates the factors behind multilingual BERT's unexpected cross-lingual capabilities, focusing on architecture, linguistic properties, and learning objectives across diverse languages and NLP tasks.
Contribution
It provides a detailed analysis of what contributes to M-BERT's cross-lingual transfer, highlighting the importance of network depth over lexical overlap.
Findings
Network depth is crucial for cross-lingual transfer.
Lexical overlap has minimal impact on cross-lingual success.
The study covers Spanish, Hindi, and Russian across two NLP tasks.
Abstract
Recent work has exhibited the surprising cross-lingual abilities of multilingual BERT (M-BERT) -- surprising since it is trained without any cross-lingual objective and with no aligned data. In this work, we provide a comprehensive study of the contribution of different components in M-BERT to its cross-lingual ability. We study the impact of linguistic properties of the languages, the architecture of the model, and the learning objectives. The experimental study is done in the context of three typologically different languages -- Spanish, Hindi, and Russian -- and using two conceptually different NLP tasks, textual entailment and named entity recognition. Among our key conclusions is the fact that the lexical overlap between languages plays a negligible role in the cross-lingual success, while the depth of the network is an integral part of it. All our models and implementations can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
MethodsLinear Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece · Softmax
