Zero-shot Cross-lingual Transfer is Under-specified Optimization
Shijie Wu, Benjamin Van Durme, Mark Dredze

TL;DR
This paper investigates why zero-shot cross-lingual transfer with pretrained multilingual encoders often results in high variance and unreliable performance, attributing it to an under-specified optimization problem that affects solution stability.
Contribution
The study reveals that zero-shot transfer solutions are in non-flat regions of the error surface, explaining high variance, and introduces an analysis of linear interpolation models to understand the optimization landscape.
Findings
Linear interpolation models show equal source error but decreasing target error.
Zero-shot solutions are in non-flat regions of the error surface.
High variance is linked to solutions in non-flat regions of the target error landscape.
Abstract
Pretrained multilingual encoders enable zero-shot cross-lingual transfer, but often produce unreliable models that exhibit high performance variance on the target language. We postulate that this high variance results from zero-shot cross-lingual transfer solving an under-specified optimization problem. We show that any linear-interpolated model between the source language monolingual model and source + target bilingual model has equally low source language generalization error, yet the target language generalization error reduces smoothly and linearly as we move from the monolingual to bilingual model, suggesting that the model struggles to identify good solutions for both source and target languages using the source language alone. Additionally, we show that zero-shot solution lies in non-flat region of target language error generalization surface, causing the high variance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling
