XLDA: Cross-Lingual Data Augmentation for Natural Language Inference and Question Answering
Jasdeep Singh, Bryan McCann, Nitish Shirish Keskar, Caiming Xiong,, Richard Socher

TL;DR
XLDA is a cross-lingual data augmentation technique that improves multilingual NLP tasks by replacing input segments with translations, leading to state-of-the-art results on several benchmarks.
Contribution
The paper introduces XLDA, a novel data augmentation method that significantly enhances multilingual NLP performance, especially for low-resource languages.
Findings
XLDA improves performance on 14 languages in XNLI benchmark.
Achieves up to 4.8% improvement, setting new state-of-the-art for Greek, Turkish, and Urdu.
Provides a 1.0% increase on SQuAD question answering task.
Abstract
While natural language processing systems often focus on a single language, multilingual transfer learning has the potential to improve performance, especially for low-resource languages. We introduce XLDA, cross-lingual data augmentation, a method that replaces a segment of the input text with its translation in another language. XLDA enhances performance of all 14 tested languages of the cross-lingual natural language inference (XNLI) benchmark. With improvements of up to , training with XLDA achieves state-of-the-art performance for Greek, Turkish, and Urdu. XLDA is in contrast to, and performs markedly better than, a more naive approach that aggregates examples in various languages in a way that each example is solely in one language. On the SQuAD question answering task, we see that XLDA provides a performance increase on the English evaluation set. Comprehensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
