ScandEval: A Benchmark for Scandinavian Natural Language Processing

Dan Saattrup Nielsen

arXiv:2304.00906·cs.CL·April 4, 2023·5 cites

ScandEval: A Benchmark for Scandinavian Natural Language Processing

Dan Saattrup Nielsen

PDF

Open Access 2 Repos

TL;DR

This paper presents ScandEval, a comprehensive benchmarking platform for Scandinavian NLP models, introducing new datasets, tools, and analysis of cross-lingual transfer and model performance across Scandinavian languages.

Contribution

The paper introduces ScandEval, a new benchmarking platform with novel datasets, a benchmarking package, and an extensive analysis of Scandinavian language models' performance.

Findings

01

Substantial cross-lingual transfer among Mainland Scandinavian languages.

02

Limited transfer between Mainland and Insular Scandinavian languages.

03

Norwegian, Swedish, and Danish models outperform multilingual models.

Abstract

This paper introduces a Scandinavian benchmarking platform, ScandEval, which can benchmark any pretrained model on four different tasks in the Scandinavian languages. The datasets used in two of the tasks, linguistic acceptability and question answering, are new. We develop and release a Python package and command-line interface, scandeval, which can benchmark any model that has been uploaded to the Hugging Face Hub, with reproducible results. Using this package, we benchmark more than 100 Scandinavian or multilingual models and present the results of these in an interactive online leaderboard, as well as provide an analysis of the results. The analysis shows that there is substantial cross-lingual transfer among the Mainland Scandinavian languages (Danish, Swedish and Norwegian), with limited cross-lingual transfer between the group of Mainland Scandinavian languages and the group of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques