TL;DR
SICKNL is a Dutch Natural Language Inference dataset derived from translating the English SICK dataset, enabling comparison of monolingual and multilingual models and revealing challenges in Dutch NLP modeling.
Contribution
This paper introduces SICKNL, a Dutch NLI dataset, and provides baseline evaluations, stress tests, and insights into Dutch language modeling challenges.
Findings
Models perform worse on SICKNL than on SICK
Dutch models struggle with syntactic restructuring
Dutch NLP models do not fully capture word order flexibility
Abstract
We present SICK-NL (read: signal), a dataset targeting Natural Language Inference in Dutch. SICK-NL is obtained by translating the SICK dataset of Marelli et al. (2014)from English into Dutch. Having a parallel inference dataset allows us to compare both monolingual and multilingual NLP models for English and Dutch on the two tasks. In the paper, we motivate and detail the translation process, perform a baseline evaluation on both the original SICK dataset and its Dutch incarnation SICK-NL, taking inspiration from Dutch skipgram embeddings and contextualised embedding models. In addition, we encapsulate two phenomena encountered in the translation to formulate stress tests and verify how well the Dutch models capture syntactic restructurings that do not affect semantics. Our main finding is all models perform worse on SICK-NL than on SICK, indicating that the Dutch dataset is more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗DTAI-KULeuven/robbertje-1-gb-bortmodel· 530 dl530 dl
- 🤗DTAI-KULeuven/robbertje-1-gb-mergedmodel· 27 dl27 dl
- 🤗DTAI-KULeuven/robbertje-1-gb-non-shuffledmodel· 14 dl14 dl
- 🤗DTAI-KULeuven/robbertje-1-gb-shuffledmodel· 617 dl617 dl
- 🤗jirmauritz/robbert-v2-dutch-basemodel· 9 dl9 dl
- 🤗pdelobelle/robbert-v2-dutch-basemodel· 28k dl· ♡ 3428k dl♡ 34
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
