DUMB: A Benchmark for Smart Evaluation of Dutch Models

Wietse de Vries; Martijn Wieling; Malvina Nissim

arXiv:2305.13026·cs.CL·October 16, 2023·1 cites

DUMB: A Benchmark for Smart Evaluation of Dutch Models

Wietse de Vries, Martijn Wieling, Malvina Nissim

PDF

Open Access 2 Repos

TL;DR

The paper introduces DUMB, a comprehensive Dutch language model benchmark with diverse datasets and a novel evaluation metric, enabling more accurate assessment and fostering future research on Dutch NLP models.

Contribution

It presents the Dutch Model Benchmark (DUMB), including new datasets and the Relative Error Reduction metric, to improve evaluation and comparison of Dutch language models.

Findings

01

Current Dutch monolingual models underperform.

02

Larger models and diverse architectures improve performance.

03

DeBERTaV3, XLM-R, and mDeBERTaV3 achieve top results.

Abstract

We introduce the Dutch Model Benchmark: DUMB. The benchmark includes a diverse set of datasets for low-, medium- and high-resource tasks. The total set of nine tasks includes four tasks that were previously not available in Dutch. Instead of relying on a mean score across tasks, we propose Relative Error Reduction (RER), which compares the DUMB performance of language models to a strong baseline which can be referred to in the future even when assessing different sets of language models. Through a comparison of 14 pre-trained language models (mono- and multi-lingual, of varying sizes), we assess the internal consistency of the benchmark tasks, as well as the factors that likely enable high performance. Our results indicate that current Dutch monolingual models under-perform and suggest training larger Dutch models with other architectures and pre-training objectives. At present, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsXLM-R