EENLP: Cross-lingual Eastern European NLP Index

Alexey Tikhonov; Alex Malkhasov; Andrey Manoshin; George Dima; R\'eka; Cserh\'ati; Md.Sadek Hossain Asif; Matt S\'ardi

arXiv:2108.02605·cs.CL·May 12, 2022

EENLP: Cross-lingual Eastern European NLP Index

Alexey Tikhonov, Alex Malkhasov, Andrey Manoshin, George Dima, R\'eka, Cserh\'ati, Md.Sadek Hossain Asif, Matt S\'ardi

PDF

Open Access 1 Repo

TL;DR

This paper introduces EENLP, a comprehensive index of Eastern European NLP resources and datasets for semantic tasks, aiming to address resource scarcity and establish performance baselines for multilingual models.

Contribution

It provides the first extensive index of Eastern European NLP datasets and models, along with cross-lingual semantic datasets for evaluation.

Findings

01

Multilingual models show varying performance across Eastern European languages.

02

The index facilitates resource discovery and community collaboration.

03

Baseline results highlight gaps and opportunities for future research.

Abstract

Motivated by the sparsity of NLP resources for Eastern European languages, we present a broad index of existing Eastern European language resources (90+ datasets and 45+ models) published as a github repository open for updates from the community. Furthermore, to support the evaluation of commonsense reasoning tasks, we provide hand-crafted cross-lingual datasets for five different semantic tasks (namely news categorization, paraphrase detection, Natural Language Inference (NLI) task, tweet sentiment detection, and news sentiment detection) for some of the Eastern European languages. We perform several experiments with the existing multilingual models on these datasets to define the performance baselines and compare them to the existing results for other languages.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

altsoph/EENLP
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification