Can current NLI systems handle German word order? Investigating language   model performance on a new German challenge set of minimal pairs

Ines Reinig; Katja Markert

arXiv:2306.04523·cs.CL·June 8, 2023·2 cites

Can current NLI systems handle German word order? Investigating language model performance on a new German challenge set of minimal pairs

Ines Reinig, Katja Markert

PDF

Open Access 1 Repo

TL;DR

This paper introduces WOGLI, a challenging German NLI dataset focusing on word order variations, revealing current models' struggles with language-specific phenomena and highlighting the need for targeted training data.

Contribution

The creation of WOGLI, the first adversarial German NLI dataset emphasizing word order and morphological cues, and an analysis of model performance on this new challenge.

Findings

01

Current models struggle with German word order variations.

02

Data augmentation improves model performance.

03

Translated NLI datasets do not capture all language phenomena.

Abstract

Compared to English, German word order is freer and therefore poses additional challenges for natural language inference (NLI). We create WOGLI (Word Order in German Language Inference), the first adversarial NLI dataset for German word order that has the following properties: (i) each premise has an entailed and a non-entailed hypothesis; (ii) premise and hypotheses differ only in word order and necessary morphological changes to mark case and number. In particular, each premise andits two hypotheses contain exactly the same lemmata. Our adversarial examples require the model to use morphological markers in order to recognise or reject entailment. We show that current German autoencoding models fine-tuned on translated NLI data can struggle on this challenge set, reflecting the fact that translated NLI datasets will not mirror all necessary language phenomena in the target language. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ireinig/wogli
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification