Improving Zero-Shot Cross-Lingual Transfer Learning via Robust Training
Kuan-Hao Huang, Wasi Uddin Ahmad, Nanyun Peng, Kai-Wei Chang

TL;DR
This paper introduces a robust training approach using adversarial training and randomized smoothing to enhance zero-shot cross-lingual transfer in multilingual models, especially for low-resource languages, without relying on costly parallel corpora.
Contribution
It proposes a novel robust training strategy that improves cross-lingual transfer by making models tolerant to embedding noise, reducing dependence on language alignment data.
Findings
Robust training significantly improves zero-shot transfer performance.
Enhanced results in generalized cross-lingual transfer with mixed-language inputs.
Robust methods outperform standard fine-tuning in low-resource scenarios.
Abstract
Pre-trained multilingual language encoders, such as multilingual BERT and XLM-R, show great potential for zero-shot cross-lingual transfer. However, these multilingual encoders do not precisely align words and phrases across languages. Especially, learning alignments in the multilingual embedding space usually requires sentence-level or word-level parallel corpora, which are expensive to be obtained for low-resource languages. An alternative is to make the multilingual encoders more robust; when fine-tuning the encoder using downstream task, we train the encoder to tolerate noise in the contextual embedding spaces such that even if the representations of different languages are not aligned well, the model can still achieve good performance on zero-shot cross-lingual transfer. In this work, we propose a learning strategy for training robust models by drawing connections between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsXLM-R · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Dropout · Adam · Dense Connections · Attention Is All You Need · Softmax · Linear Warmup With Linear Decay · WordPiece
