RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark

Tatiana Shavrina; Alena Fenogenova; Anton Emelyanov; Denis; Shevelev; Ekaterina Artemova; Valentin Malykh; Vladislav Mikhailov; and Maria Tikhonova; Andrey Chertok; Andrey Evlampiev

arXiv:2010.15925·cs.CL·October 4, 2023

RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark

Tatiana Shavrina, Alena Fenogenova, Anton Emelyanov, Denis, Shevelev, Ekaterina Artemova, Valentin Malykh, Vladislav Mikhailov, and Maria Tikhonova, Andrey Chertok, Andrey Evlampiev

PDF

2 Repos 2 Datasets

TL;DR

This paper introduces RussianSuperGLUE, a comprehensive benchmark for evaluating Russian language understanding across nine tasks, enabling better diagnostics of transformer models' general skills.

Contribution

It presents the first Russian language understanding benchmark with nine tasks, baselines, human evaluation, and an open-source evaluation framework.

Findings

01

Established baseline performances for Russian language models.

02

Compared multilingual models on Russian diagnostic tasks.

03

Provided a leaderboard for Russian language understanding models.

Abstract

In this paper, we introduce an advanced Russian general language understanding evaluation benchmark -- RussianGLUE. Recent advances in the field of universal language models and transformers require the development of a methodology for their broad diagnostics and testing for general intellectual skills - detection of natural language inference, commonsense reasoning, ability to perform simple logical operations regardless of text subject or lexicon. For the first time, a benchmark of nine tasks, collected and organized analogically to the SuperGLUE methodology, was developed from scratch for the Russian language. We provide baselines, human level evaluation, an open-source framework for evaluating models (https://github.com/RussianNLP/RussianSuperGLUE), and an overall leaderboard of transformer models for the Russian language. Besides, we present the first results of comparing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.