From Zero to Hero: On the Limitations of Zero-Shot Cross-Lingual Transfer with Multilingual Transformers
Anne Lauscher, Vinit Ravishankar, Ivan Vuli\'c, Goran, Glava\v{s}

TL;DR
This paper investigates the limitations of zero-shot cross-lingual transfer using multilingual transformers, highlighting challenges with resource-scarce and distant languages, and demonstrating the effectiveness of few-shot transfer methods.
Contribution
The study provides empirical analysis of zero-shot transfer limitations and shows that few-shot transfer can significantly improve performance across diverse languages and tasks.
Findings
Transfer performance correlates with linguistic similarity and target language data size.
Zero-shot transfer is less effective for resource-lean and distant languages.
Few-shot transfer with minimal target data can substantially enhance results.
Abstract
Massively multilingual transformers pretrained with language modeling objectives (e.g., mBERT, XLM-R) have become a de facto default transfer paradigm for zero-shot cross-lingual transfer in NLP, offering unmatched transfer performance. Current downstream evaluations, however, verify their efficacy predominantly in transfer settings involving languages with sufficient amounts of pretraining data, and with lexically and typologically close languages. In this work, we analyze their limitations and show that cross-lingual transfer via massively multilingual transformers, much like transfer via cross-lingual word embeddings, is substantially less effective in resource-lean scenarios and for distant languages. Our experiments, encompassing three lower-level tasks (POS tagging, dependency parsing, NER), as well as two high-level semantic tasks (NLI, QA), empirically correlate transfer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsmBERT
