TAPE: Assessing Few-shot Russian Language Understanding

Ekaterina Taktasheva; Tatiana Shavrina; Alena Fenogenova; Denis; Shevelev; Nadezhda Katricheva; Maria Tikhonova; Albina Akhmetgareeva; Oleg; Zinkevich; Anastasiia Bashmakova; Svetlana Iordanskaia; Alena Spiridonova,; Valentina Kurenshchikova; Ekaterina Artemova; Vladislav Mikhailov

arXiv:2210.12813·cs.CL·October 4, 2023

TAPE: Assessing Few-shot Russian Language Understanding

Ekaterina Taktasheva, Tatiana Shavrina, Alena Fenogenova, Denis, Shevelev, Nadezhda Katricheva, Maria Tikhonova, Albina Akhmetgareeva, Oleg, Zinkevich, Anastasiia Bashmakova, Svetlana Iordanskaia, Alena Spiridonova,, Valentina Kurenshchikova, Ekaterina Artemova

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces TAPE, a comprehensive benchmark for evaluating Russian language understanding in zero-shot and few-shot scenarios, addressing the lack of standardized evaluation tools for non-English languages.

Contribution

It presents TAPE, a novel benchmark with complex NLU tasks for Russian, including adversarial attacks and subpopulation analysis, to improve robustness and generalization evaluation.

Findings

01

Simple spelling perturbations significantly impact performance.

02

Paraphrasing has a negligible effect on model accuracy.

03

Neural models lag behind human performance on most tasks.

Abstract

Recent advances in zero-shot and few-shot learning have shown promise for a scope of research and practical purposes. However, this fast-growing area lacks standardized evaluation suites for non-English languages, hindering progress outside the Anglo-centric paradigm. To address this line of research, we propose TAPE (Text Attack and Perturbation Evaluation), a novel benchmark that includes six more complex NLU tasks for Russian, covering multi-hop reasoning, ethical concepts, logic and commonsense knowledge. The TAPE's design focuses on systematic zero-shot and few-shot NLU evaluation: (i) linguistic-oriented adversarial attacks and perturbations for analyzing robustness, and (ii) subpopulations for nuanced interpretation. The detailed analysis of testing the autoregressive baselines indicates that simple spelling-based perturbations affect the performance the most, while paraphrasing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

RussianNLP/TAPE
noneOfficial

Datasets

RussianNLP/tape
dataset· 798 dl
798 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Adversarial Robustness in Machine Learning