TL;DR
This paper evaluates multi-document summarization corpora and analyzes system biases, highlighting the variability in datasets and the impact on system performance and evaluation metrics.
Contribution
It introduces a framework for assessing MDS corpora quality, analyzes reasons for inconsistent system performance, and examines how corpus properties influence bias and evaluation.
Findings
Corpora vary significantly in overlap and conflict levels.
System performance is inconsistent across different datasets.
Corpus properties influence bias propagation in summarization systems.
Abstract
Multi-document summarization (MDS) is the task of reflecting key points from any set of documents into a concise text paragraph. In the past, it has been used to aggregate news, tweets, product reviews, etc. from various sources. Owing to no standard definition of the task, we encounter a plethora of datasets with varying levels of overlap and conflict between participating documents. There is also no standard regarding what constitutes summary information in MDS. Adding to the challenge is the fact that new systems report results on a set of chosen datasets, which might not correlate with their performance on the other datasets. In this paper, we study this heterogeneous task with the help of a few widely used MDS corpora and a suite of state-of-the-art models. We make an attempt to quantify the quality of summarization corpus and prescribe a list of points to consider while proposing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
