Extending Multi-Text Sentence Fusion Resources via Pyramid Annotations
Daniela Brook Weiss, Paul Roit, Ori Ernst, Ido Dagan

TL;DR
This paper significantly extends a sentence fusion dataset by tripling its size and improving its diversity, thereby enhancing model training for multi-document summarization and redundancy detection tasks.
Contribution
The authors revisited and expanded an existing sentence fusion dataset, making it larger, more diverse, and more representative for multi-document NLP tasks.
Findings
Extended dataset is three times larger than previous versions.
The new dataset improves model training effectiveness.
More diverse and representative texts enhance multi-document summarization.
Abstract
NLP models that compare or consolidate information across multiple documents often struggle when challenged with recognizing substantial information redundancies across the texts. For example, in multi-document summarization it is crucial to identify salient information across texts and then generate a non-redundant summary, while facing repeated and usually differently-phrased salient content. To facilitate researching such challenges, the sentence-level task of \textit{sentence fusion} was proposed, yet previous datasets for this task were very limited in their size and scope. In this paper, we revisit and substantially extend previous dataset creation efforts. With careful modifications, relabeling and employing complementing data sources, we were able to triple the size of a notable earlier dataset. Moreover, we show that our extended version uses more representative texts for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
