How multilingual is Multilingual BERT?
Telmo Pires, Eva Schlinger, Dan Garrette

TL;DR
This paper investigates Multilingual BERT's ability to transfer knowledge across 104 languages, revealing its strengths in cross-lingual tasks and highlighting systematic limitations in its multilingual representations.
Contribution
The study provides extensive probing experiments demonstrating M-BERT's cross-lingual transfer capabilities and its limitations, especially across different scripts and language pairs.
Findings
M-BERT performs well in zero-shot cross-lingual transfer.
Transfer is more effective between typologically similar languages.
The model can identify translation pairs and handle code-switching.
Abstract
In this paper, we show that Multilingual BERT (M-BERT), released by Devlin et al. (2018) as a single language model pre-trained from monolingual corpora in 104 languages, is surprisingly good at zero-shot cross-lingual model transfer, in which task-specific annotations in one language are used to fine-tune the model for evaluation in another language. To understand why, we present a large number of probing experiments, showing that transfer is possible even to languages in different scripts, that transfer works best between typologically similar languages, that monolingual corpora can train models for code-switching, and that the model can find translation pairs. From these results, we can conclude that M-BERT does create multilingual representations, but that these representations exhibit systematic deficiencies affecting certain language pairs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsLinear Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece · Softmax
