XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating   Cross-lingual Generalization

Junjie Hu; Sebastian Ruder; Aditya Siddhant; Graham Neubig; Orhan; Firat; Melvin Johnson

arXiv:2003.11080·cs.CL·September 7, 2020·299 cites

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

Junjie Hu, Sebastian Ruder, Aditya Siddhant, Graham Neubig, Orhan, Firat, Melvin Johnson

PDF

Open Access 4 Repos 2 Models 2 Datasets

TL;DR

XTREME is a comprehensive multilingual benchmark evaluating cross-lingual generalization across 40 languages and 9 tasks, revealing performance gaps and encouraging research in multilingual transfer learning.

Contribution

Introduces the XTREME benchmark for evaluating multilingual models across diverse languages and tasks, filling a gap in comprehensive cross-lingual evaluation tools.

Findings

01

Models perform well on English, reaching human levels on many tasks.

02

Significant performance gaps exist in cross-lingual transfer, especially in syntax and retrieval.

03

Results vary widely across different languages.

Abstract

Much recent progress in applications of machine learning models to NLP has been driven by benchmarks that evaluate models across a wide variety of tasks. However, these broad-coverage benchmarks have been mostly limited to English, and despite an increasing interest in multilingual models, a benchmark that enables the comprehensive evaluation of such methods on a diverse range of languages and tasks is still missing. To this end, we introduce the Cross-lingual TRansfer Evaluation of Multilingual Encoders XTREME benchmark, a multi-task benchmark for evaluating the cross-lingual generalization capabilities of multilingual representations across 40 languages and 9 tasks. We demonstrate that while models tested on English reach human performance on many tasks, there is still a sizable gap in the performance of cross-lingually transferred models, particularly on syntactic and sentence…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications