RuBLiMP: Russian Benchmark of Linguistic Minimal Pairs

Ekaterina Taktasheva; Maxim Bazhukov; Kirill Koncha; Alena Fenogenova,; Ekaterina Artemova; Vladislav Mikhailov

arXiv:2406.19232·cs.CL·October 3, 2024

RuBLiMP: Russian Benchmark of Linguistic Minimal Pairs

Ekaterina Taktasheva, Maxim Bazhukov, Kirill Koncha, Alena Fenogenova,, Ekaterina Artemova, Vladislav Mikhailov

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

RuBLiMP is a comprehensive Russian linguistic minimal pairs benchmark with 45,000 sentence pairs, designed to evaluate language models' grasp of diverse grammatical phenomena through automated and curated data.

Contribution

It introduces a novel, large-scale Russian minimal pairs benchmark created via linguistic perturbations, expanding evaluation resources for language models.

Findings

01

Models are sensitive to morphology and agreement contrasts.

02

Models underperform humans on structural, negation, transitivity, and tense phenomena.

03

Benchmark is publicly available for further research.

Abstract

Minimal pairs are a well-established approach to evaluating the grammatical knowledge of language models. However, existing resources for minimal pairs address a limited number of languages and lack diversity of language-specific grammatical phenomena. This paper introduces the Russian Benchmark of Linguistic Minimal Pairs (RuBLiMP), which includes 45k pairs of sentences that differ in grammaticality and isolate a morphological, syntactic, or semantic phenomenon. In contrast to existing benchmarks of linguistic minimal pairs, RuBLiMP is created by applying linguistic perturbations to automatically annotated sentences from open text corpora and carefully curating test data. We describe the data collection protocol and present the results of evaluating 25 language models in various scenarios. We find that the widely used language models for Russian are sensitive to morphological and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

russiannlp/rublimp
noneOfficial

Datasets

RussianNLP/rublimp
dataset· 351 dl
351 dl

Videos

RuBLiMP: Russian Benchmark of Linguistic Minimal Pairs· underline

Taxonomy

TopicsNatural Language Processing Techniques