A Closer Look at Few-Shot Crosslingual Transfer: The Choice of Shots Matters
Mengjie Zhao, Yi Zhu, Ehsan Shareghi, Ivan Vuli\'c, Roi Reichart, Anna, Korhonen, Hinrich Sch\"utze

TL;DR
This paper investigates the sensitivity of few-shot crosslingual transfer to shot selection, demonstrating the importance of standardization and showing that full model fine-tuning can outperform other methods.
Contribution
It highlights the high sensitivity of few-shot transfer to shot selection and provides a large-scale analysis across multiple languages and tasks, advocating for standardized experimental protocols.
Findings
Few-shot transfer performance varies significantly with shot choice.
Full model fine-tuning outperforms several state-of-the-art few-shot methods.
Providing sampled shots promotes standardization in future research.
Abstract
Few-shot crosslingual transfer has been shown to outperform its zero-shot counterpart with pretrained encoders like multilingual BERT. Despite its growing popularity, little to no attention has been paid to standardizing and analyzing the design of few-shot experiments. In this work, we highlight a fundamental risk posed by this shortcoming, illustrating that the model exhibits a high degree of sensitivity to the selection of few shots. We conduct a large-scale experimental study on 40 sets of sampled few shots for six diverse NLP tasks across up to 40 languages. We provide an analysis of success and failure cases of few-shot transfer, which highlights the role of lexical features. Additionally, we show that a straightforward full model finetuning approach is quite effective for few-shot transfer, outperforming several state-of-the-art few-shot approaches. As a step towards…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
MethodsAttention Is All You Need · Linear Layer · Adam · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay · Layer Normalization · Residual Connection · WordPiece · Attention Dropout · Dense Connections
