SICKNL: A Dataset for Dutch Natural Language Inference

Gijs Wijnholds; Michael Moortgat

arXiv:2101.05716·cs.CL·January 15, 2021

SICKNL: A Dataset for Dutch Natural Language Inference

Gijs Wijnholds, Michael Moortgat

PDF

1 Repo 6 Models

TL;DR

SICKNL is a Dutch Natural Language Inference dataset derived from translating the English SICK dataset, enabling comparison of monolingual and multilingual models and revealing challenges in Dutch NLP modeling.

Contribution

This paper introduces SICKNL, a Dutch NLI dataset, and provides baseline evaluations, stress tests, and insights into Dutch language modeling challenges.

Findings

01

Models perform worse on SICKNL than on SICK

02

Dutch models struggle with syntactic restructuring

03

Dutch NLP models do not fully capture word order flexibility

Abstract

We present SICK-NL (read: signal), a dataset targeting Natural Language Inference in Dutch. SICK-NL is obtained by translating the SICK dataset of Marelli et al. (2014)from English into Dutch. Having a parallel inference dataset allows us to compare both monolingual and multilingual NLP models for English and Dutch on the two tasks. In the paper, we motivate and detail the translation process, perform a baseline evaluation on both the original SICK dataset and its Dutch incarnation SICK-NL, taking inspiration from Dutch skipgram embeddings and contextualised embedding models. In addition, we encapsulate two phenomena encountered in the translation to formulate stress tests and verify how well the Dutch models capture syntactic restructurings that do not affect semantics. Our main finding is all models perform worse on SICK-NL than on SICK, indicating that the Dutch dataset is more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gijswijnholds/sick_nl
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.