XNLI: Evaluating Cross-lingual Sentence Representations
Alexis Conneau, Guillaume Lample, Ruty Rinott, Adina Williams, Samuel, R. Bowman, Holger Schwenk, Veselin Stoyanov

TL;DR
This paper introduces XNLI, a multilingual dataset for evaluating cross-lingual sentence understanding, extending MultiNLI to 15 languages, and provides baseline models demonstrating the challenge of cross-lingual transfer.
Contribution
The creation of the XNLI dataset for cross-lingual evaluation and baseline models for multilingual sentence understanding.
Findings
Translating test data yields the best baseline performance.
XNLI is a practical, challenging evaluation suite.
Multilingual models still lag behind monolingual performance.
Abstract
State-of-the-art natural language processing systems rely on supervision in the form of annotated data to learn competent models. These models are generally trained on data in a single language (usually English), and cannot be directly used beyond that language. Since collecting data in every language is not realistic, there has been a growing interest in cross-lingual language understanding (XLU) and low-resource cross-language transfer. In this work, we construct an evaluation set for XLU by extending the development and test sets of the Multi-Genre Natural Language Inference Corpus (MultiNLI) to 15 languages, including low-resource languages such as Swahili and Urdu. We hope that our dataset, dubbed XNLI, will catalyze research in cross-lingual sentence understanding by providing an informative standard evaluation task. In addition, we provide several baselines for multilingual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7model· 193k dl· ♡ 355193k dl♡ 355
- 🤗MoritzLaurer/xlm-v-base-mnli-xnlimodel· 171 dl· ♡ 23171 dl♡ 23
- 🤗FacebookAI/roberta-large-mnlimodel· 295k dl· ♡ 210295k dl♡ 210
- 🤗DeepPavlov/bert-base-multilingual-cased-sentencemodel· 78 dl· ♡ 478 dl♡ 4
- 🤗DeepPavlov/rubert-base-cased-sentencemodel· 13k dl· ♡ 2913k dl♡ 29
- 🤗MoritzLaurer/mDeBERTa-v3-base-mnli-xnlimodel· 203k dl· ♡ 298203k dl♡ 298
- 🤗inokufu/flaubert-base-uncased-xnli-sts-finetuned-educationmodel· 9 dl· ♡ 19 dl♡ 1
- 🤗inokufu/flaubert-base-uncased-xnli-stsmodel· 26 dl· ♡ 726 dl♡ 7
- 🤗microsoft/Multilingual-MiniLM-L12-H384model· 60k dl· ♡ 10060k dl♡ 100
- 🤗inokufu/bert-base-uncased-xnli-sts-finetuned-educationmodel· 8 dl8 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
