PropSegmEnt: A Large-Scale Corpus for Proposition-Level Segmentation and Entailment Recognition
Sihao Chen, Senaka Buthpitiya, Alex Fabrikant, Dan Roth and, Tal Schuster

TL;DR
PropSegmEnt introduces a large annotated corpus for proposition-level segmentation and entailment recognition, enabling more granular analysis of entailment relations within sentences for improved natural language inference understanding.
Contribution
The paper presents a novel dataset with proposition-level annotations and establishes baseline models for segmentation and entailment tasks, advancing fine-grained NLI research.
Findings
Strong baseline performance on segmentation and entailment tasks
Potential for improved NLI explainability and compositionality analysis
Usefulness demonstrated in summary hallucination detection
Abstract
The widely studied task of Natural Language Inference (NLI) requires a system to recognize whether one piece of text is textually entailed by another, i.e. whether the entirety of its meaning can be inferred from the other. In current NLI datasets and models, textual entailment relations are typically defined on the sentence- or paragraph-level. However, even a simple sentence often contains multiple propositions, i.e. distinct units of meaning conveyed by the sentence. As these propositions can carry different truth values in the context of a given premise, we argue for the need to recognize the textual entailment relation of each proposition in a sentence individually. We propose PropSegmEnt, a corpus of over 45K propositions annotated by expert human raters. Our dataset structure resembles the tasks of (1) segmenting sentences within a document to the set of propositions, and (2)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
