Investigating Reasons for Disagreement in Natural Language Inference
Nan-Jiang Jiang, Marie-Catherine de Marneffe

TL;DR
This paper explores the causes of disagreement in natural language inference annotations, proposing a taxonomy of sources and comparing modeling approaches to detect ambiguous items, ultimately improving interpretability.
Contribution
It introduces a detailed taxonomy of disagreement sources in NLI and evaluates two modeling methods, finding multilabel classification more effective for capturing ambiguity.
Findings
Multilabel classification outperforms 4-way classification in detecting ambiguous items.
Disagreements stem from sentence meaning uncertainty, annotator biases, and task artifacts.
A taxonomy of 10 disagreement sources was developed.
Abstract
We investigate how disagreement in natural language inference (NLI) annotation arises. We developed a taxonomy of disagreement sources with 10 categories spanning 3 high-level classes. We found that some disagreements are due to uncertainty in the sentence meaning, others to annotator biases and task artifacts, leading to different interpretations of the label distribution. We explore two modeling approaches for detecting items with potential disagreement: a 4-way classification with a "Complicated" label in addition to the three standard NLI labels, and a multilabel classification approach. We found that the multilabel classification is more expressive and gives better recall of the possible interpretations in the data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods
