The Power of Summary-Source Alignments

Ori Ernst; Ori Shapira; Aviv Slobodkin; Sharon Adar; Mohit Bansal,; Jacob Goldberger; Ran Levy; and Ido Dagan

arXiv:2406.00842·cs.CL·June 4, 2024

The Power of Summary-Source Alignments

Ori Ernst, Ori Shapira, Aviv Slobodkin, Sharon Adar, Mohit Bansal,, Jacob Goldberger, Ran Levy, and Ido Dagan

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper enhances multi-document summarization by applying detailed, manual summary-source alignments at the proposition level, creating datasets for multiple tasks, and providing baselines to advance research in the field.

Contribution

It introduces a fine-grained, multi-document alignment framework, manual annotation of alignments, and the creation of multiple datasets for diverse summarization tasks.

Findings

01

Manual alignment improves dataset quality for summarization tasks.

02

Multiple datasets enable benchmarking across six different tasks.

03

Baseline models demonstrate the utility of the proposed datasets.

Abstract

Multi-document summarization (MDS) is a challenging task, often decomposed to subtasks of salience and redundancy detection, followed by text generation. In this context, alignment of corresponding sentences between a reference summary and its source documents has been leveraged to generate training data for some of the component tasks. Yet, this enabling alignment step has usually been applied heuristically on the sentence level on a limited number of subtasks. In this paper, we propose extending the summary-source alignment framework by (1) applying it at the more fine-grained proposition span level, (2) annotating alignment manually in a multi-document setup, and (3) revealing the great potential of summary-source alignments to yield several datasets for at least six different tasks. Specifically, for each of the tasks, we release a manually annotated test set that was derived…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

oriern/SPARK
noneOfficial

Videos

The Power of Summary-Source Alignments· underline

Taxonomy

TopicsTopic Modeling · Text Readability and Simplification · Advanced Text Analysis Techniques

MethodsSparse Evolutionary Training