How "Multi" is Multi-Document Summarization?

Ruben Wolhandler; Arie Cattan; Ori Ernst; Ido Dagan

arXiv:2210.12688·cs.CL·October 25, 2022

How "Multi" is Multi-Document Summarization?

Ruben Wolhandler, Arie Cattan, Ori Ernst, Ido Dagan

PDF

Open Access 1 Repo

TL;DR

This paper introduces an automated metric to measure how much a multi-document summary relies on multiple sources, revealing that many datasets and systems often generate summaries from single documents rather than combining dispersed information.

Contribution

The paper proposes a novel metric to quantify the dispersion of source information in multi-document summaries and applies it to analyze existing datasets and systems.

Findings

01

Many datasets contain summaries that are mostly covered by a single document.

02

State-of-the-art systems often do not effectively combine information from multiple sources.

03

The proposed metric can guide the development of more multi-source-aware summarization models.

Abstract

The task of multi-document summarization (MDS) aims at models that, given multiple documents as input, are able to generate a summary that combines disperse information, originally spread across these documents. Accordingly, it is expected that both reference summaries in MDS datasets, as well as system summaries, would indeed be based on such dispersed information. In this paper, we argue for quantifying and assessing this expectation. To that end, we propose an automated measure for evaluating the degree to which a summary is ``disperse'', in the sense of the number of source documents needed to cover its content. We apply our measure to empirically analyze several popular MDS datasets, with respect to their reference summaries, as well as the output of state-of-the-art systems. Our results show that certain MDS datasets barely require combining information from multiple documents,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ariecattan/multi_mds
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Recommender Systems and Techniques · Natural Language Processing Techniques