Polish Natural Language Inference and Factivity -- an Expert-based   Dataset and Benchmarks

Daniel Ziembicki; Anna Wr\'oblewska; Karolina Seweryn

arXiv:2201.03521·cs.CL·June 21, 2023·1 cites

Polish Natural Language Inference and Factivity -- an Expert-based Dataset and Benchmarks

Daniel Ziembicki, Anna Wr\'oblewska, Karolina Seweryn

PDF

Open Access

TL;DR

This paper introduces a new Polish NLI dataset focused on factivity, evaluates transformer models and linguistic features, and highlights challenges in complex cases like entitlement and non-factive verbs.

Contribution

It provides the first expert-annotated Polish factivity NLI dataset and benchmarks the performance of BERT-based models and linguistic features on this task.

Findings

01

BERT-based models achieved around 89% F1 score.

02

Linguistic feature-based models achieved around 91% F1 score.

03

Complex cases like entitlement and non-factive verbs remain challenging.

Abstract

Despite recent breakthroughs in Machine Learning for Natural Language Processing, the Natural Language Inference (NLI) problems still constitute a challenge. To this purpose we contribute a new dataset that focuses exclusively on the factivity phenomenon; however, our task remains the same as other NLI tasks, i.e. prediction of entailment, contradiction or neutral (ECN). The dataset contains entirely natural language utterances in Polish and gathers 2,432 verb-complement pairs and 309 unique verbs. The dataset is based on the National Corpus of Polish (NKJP) and is a representative sample in regards to frequency of main verbs and other linguistic features (e.g. occurrence of internal negation). We found that transformer BERT-based models working on sentences obtained relatively good results ( $\approx 89%$ F1 score). Even though better results were achieved using linguistic features…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · linguistics and terminology studies