This is the way: designing and compiling LEPISZCZE, a comprehensive NLP   benchmark for Polish

{\L}ukasz Augustyniak; Kamil Tagowski; Albert Sawczyn; Denis Janiak,; Roman Bartusiak; Adrian Szymczak; Marcin W\k{a}troba; Arkadiusz Janz; Piotr; Szyma\'nski; Miko{\l}aj Morzy; Tomasz Kajdanowicz; Maciej Piasecki

arXiv:2211.13112·cs.CL·November 24, 2022·5 cites

This is the way: designing and compiling LEPISZCZE, a comprehensive NLP benchmark for Polish

{\L}ukasz Augustyniak, Kamil Tagowski, Albert Sawczyn, Denis Janiak,, Roman Bartusiak, Adrian Szymczak, Marcin W\k{a}troba, Arkadiusz Janz, Piotr, Szyma\'nski, Miko{\l}aj Morzy, Tomasz Kajdanowicz, Maciej Piasecki

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

This paper introduces LEPISZCZE, a comprehensive and flexible benchmark for Polish NLP, addressing the gap in standardized evaluation tools for low-resource languages and providing a blueprint for similar efforts.

Contribution

The paper presents LEPISZCZE, a new extensive benchmark for Polish NLP, including design principles, dataset integration, and initial experimental results, serving as a model for other low-resource languages.

Findings

01

LEPISZCZE includes 13 experiments with recent Polish language models.

02

The benchmark incorporates five existing and eight novel datasets.

03

Insights from creating LEPISZCZE can guide similar benchmarks for other languages.

Abstract

The availability of compute and data to train larger and larger language models increases the demand for robust methods of benchmarking the true progress of LM training. Recent years witnessed significant progress in standardized benchmarking for English. Benchmarks such as GLUE, SuperGLUE, or KILT have become de facto standard tools to compare large language models. Following the trend to replicate GLUE for other languages, the KLEJ benchmark has been released for Polish. In this paper, we evaluate the progress in benchmarking for low-resourced languages. We note that only a handful of languages have such comprehensive benchmarks. We also note the gap in the number of tasks being evaluated by benchmarks for resource-rich English/Chinese and the rest of the world. In this paper, we introduce LEPISZCZE (the Polish word for glew, the Middle English predecessor of glue), a new,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

clarin-pl/lepiszcze
pytorchOfficial

Datasets

mteb/PAC
dataset· 389 dl
389 dl

Videos

This is the way: designing and compiling LEPISZCZE, a comprehensive NLP benchmark for Polish· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsTest