XTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation

Sebastian Ruder; Noah Constant; Jan Botha; Aditya Siddhant; Orhan; Firat; Jinlan Fu; Pengfei Liu; Junjie Hu; Dan Garrette; Graham Neubig; Melvin; Johnson

arXiv:2104.07412·cs.CL·October 8, 2021

XTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation

Sebastian Ruder, Noah Constant, Jan Botha, Aditya Siddhant, Orhan, Firat, Jinlan Fu, Pengfei Liu, Junjie Hu, Dan Garrette, Graham Neubig, Melvin, Johnson

PDF

1 Repo 1 Datasets

TL;DR

XTREME-R advances multilingual NLP evaluation by introducing more challenging tasks, a broader language set, and diagnostic tools to better understand model capabilities and limitations.

Contribution

The paper extends the XTREME benchmark to XTREME-R, adding difficult tasks, more languages, and diagnostic tools for comprehensive multilingual model evaluation.

Findings

01

Significant performance improvements on XTREME benchmark.

02

Identification of challenges in cross-lingual transfer learning.

03

Enhanced understanding of model strengths and weaknesses.

Abstract

Machine learning has brought striking advances in multilingual natural language processing capabilities over the past year. For example, the latest techniques have improved the state-of-the-art performance on the XTREME multilingual benchmark by more than 13 points. While a sizeable gap to human-level performance remains, improvements have been easier to achieve in some tasks than in others. This paper analyzes the current state of cross-lingual transfer learning and summarizes some lessons learned. In order to catalyze meaningful progress, we extend XTREME to XTREME-R, which consists of an improved set of ten natural language understanding tasks, including challenging language-agnostic retrieval tasks, and covers 50 typologically diverse languages. In addition, we provide a massively multilingual diagnostic suite (MultiCheckList) and fine-grained multi-dataset evaluation capabilities…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google-research/xtreme
pytorchOfficial

Datasets

izhx/xtreme-r-udpos
dataset· 155 dl
155 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.