Por Qu\'e N\~ao Utiliser Alla Spr{\aa}k? Mixed Training with Gradient Optimization in Few-Shot Cross-Lingual Transfer
Haoran Xu, Kenton Murray

TL;DR
This paper introduces a novel mixed training approach with gradient optimization for few-shot cross-lingual transfer, outperforming traditional target-adapting methods across multiple NLP tasks and languages.
Contribution
It proposes a one-step mixed training method using stochastic gradient surgery, enabling a single model to handle multiple target languages simultaneously without large development sets.
Findings
Achieves state-of-the-art results on 4 NLP tasks across 48 languages.
Significantly outperforms target-adapting methods, especially for distant languages.
Demonstrates effectiveness without development sets and reduces overfitting.
Abstract
The current state-of-the-art for few-shot cross-lingual transfer learning first trains on abundant labeled data in the source language and then fine-tunes with a few examples on the target language, termed target-adapting. Though this has been demonstrated to work on a variety of tasks, in this paper we show some deficiencies of this approach and propose a one-step mixed training method that trains on both source and target data with \textit{stochastic gradient surgery}, a novel gradient-level optimization. Unlike the previous studies that focus on one language at a time when target-adapting, we use one model to handle all target languages simultaneously to avoid excessively language-specific models. Moreover, we discuss the unreality of utilizing large target development sets for model selection in previous literature. We further show that our method is both development-free for target…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · COVID-19 diagnosis using AI
