Extending Multi-Text Sentence Fusion Resources via Pyramid Annotations

Daniela Brook Weiss; Paul Roit; Ori Ernst; Ido Dagan

arXiv:2110.04517·cs.CL·October 12, 2021

Extending Multi-Text Sentence Fusion Resources via Pyramid Annotations

Daniela Brook Weiss, Paul Roit, Ori Ernst, Ido Dagan

PDF

Open Access 1 Repo

TL;DR

This paper significantly extends a sentence fusion dataset by tripling its size and improving its diversity, thereby enhancing model training for multi-document summarization and redundancy detection tasks.

Contribution

The authors revisited and expanded an existing sentence fusion dataset, making it larger, more diverse, and more representative for multi-document NLP tasks.

Findings

01

Extended dataset is three times larger than previous versions.

02

The new dataset improves model training effectiveness.

03

More diverse and representative texts enhance multi-document summarization.

Abstract

NLP models that compare or consolidate information across multiple documents often struggle when challenged with recognizing substantial information redundancies across the texts. For example, in multi-document summarization it is crucial to identify salient information across texts and then generate a non-redundant summary, while facing repeated and usually differently-phrased salient content. To facilitate researching such challenges, the sentence-level task of \textit{sentence fusion} was proposed, yet previous datasets for this task were very limited in their size and scope. In this paper, we revisit and substantially extend previous dataset creation efforts. With careful modifications, relabeling and employing complementing data sources, we were able to triple the size of a notable earlier dataset. Moreover, we show that our extended version uses more representative texts for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

danielabweiss/extending-sentence-fusion-resources
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques