Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT
Shijie Wu, Mark Dredze

TL;DR
This paper investigates the cross-lingual transfer capabilities of mBERT across five NLP tasks and 39 languages, demonstrating its competitive performance and analyzing factors influencing its effectiveness.
Contribution
It provides a comprehensive evaluation of mBERT's zero-shot transfer across multiple tasks and languages, revealing its surprising effectiveness and strategies for optimal use.
Findings
mBERT is competitive with state-of-the-art zero-shot transfer methods.
Effective strategies for utilizing mBERT in cross-lingual tasks are identified.
Factors influencing cross-lingual transfer are analyzed.
Abstract
Pretrained contextual representation models (Peters et al., 2018; Devlin et al., 2018) have pushed forward the state-of-the-art on many NLP tasks. A new release of BERT (Devlin, 2018) includes a model simultaneously pretrained on 104 languages with impressive performance for zero-shot cross-lingual transfer on a natural language inference task. This paper explores the broader cross-lingual potential of mBERT (multilingual) as a zero shot language transfer model on 5 NLP tasks covering a total of 39 languages from various language families: NLI, document classification, NER, POS tagging, and dependency parsing. We compare mBERT with the best-published methods for zero-shot cross-lingual transfer and find mBERT competitive on each task. Additionally, we investigate the most effective strategy for utilizing mBERT in this manner, determine to what extent mBERT generalizes away from language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsLinear Layer · mBERT · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece
