XNLI 2.0: Improving XNLI dataset and performance on Cross Lingual Understanding (XLU)
Ankit Kumar Upadhyay, Harsit Kumar Upadhya

TL;DR
This paper enhances the XNLI dataset by re-translating it with Google Translate and evaluates cross-lingual natural language inference performance across 15 languages, including low-resource ones, using improved datasets and multilingual models.
Contribution
The work improves the XNLI dataset through re-translation and analyzes cross-lingual NLI performance, especially in low-resource languages, leveraging multilingual models.
Findings
Improved dataset leads to better cross-lingual inference performance.
Multilingual models perform well even on low-resource languages.
Re-translation reduces dataset noise and enhances model training.
Abstract
Natural Language Processing systems are heavily dependent on the availability of annotated data to train practical models. Primarily, models are trained on English datasets. In recent times, significant advances have been made in multilingual understanding due to the steeply increasing necessity of working in different languages. One of the points that stands out is that since there are now so many pre-trained multilingual models, we can utilize them for cross-lingual understanding tasks. Using cross-lingual understanding and Natural Language Inference, it is possible to train models whose applications extend beyond the training language. We can leverage the power of machine translation to skip the tiresome part of translating datasets from one language to another. In this work, we focus on improving the original XNLI dataset by re-translating the MNLI dataset in all of the 14 different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsTest
