TL;DR
This paper introduces a machine learning-based method for selecting better cross-lingual models using internal representations, improving performance across multiple languages without relying on target language data.
Contribution
It proposes a novel model selection approach leveraging internal representations to enhance zero-shot cross-lingual transfer performance.
Findings
Consistently outperforms English validation data in model selection.
Achieves comparable results to target language validation data in many cases.
Effective across 25 diverse languages, including low-resource ones.
Abstract
Transformers that are pre-trained on multilingual corpora, such as, mBERT and XLM-RoBERTa, have achieved impressive cross-lingual transfer capabilities. In the zero-shot transfer setting, only English training data is used, and the fine-tuned model is evaluated on another target language. While this works surprisingly well, substantial variance has been observed in target language performance between different fine-tuning runs, and in the zero-shot setup, no target-language development data is available to select among multiple fine-tuned models. Prior work has relied on English dev data to select among models that are fine-tuned with different learning rates, number of steps and other hyperparameters, often resulting in suboptimal choices. In this paper, we show that it is possible to select consistently better models when small amounts of annotated data are available in auxiliary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsmBERT
