Corpora Evaluation and System Bias Detection in Multi-document   Summarization

Alvin Dey; Tanya Chowdhury; Yash Kumar Atri; Tanmoy Chakraborty

arXiv:2010.01786·cs.CL·October 6, 2020

Corpora Evaluation and System Bias Detection in Multi-document Summarization

Alvin Dey, Tanya Chowdhury, Yash Kumar Atri, Tanmoy Chakraborty

PDF

1 Repo

TL;DR

This paper evaluates multi-document summarization corpora and analyzes system biases, highlighting the variability in datasets and the impact on system performance and evaluation metrics.

Contribution

It introduces a framework for assessing MDS corpora quality, analyzes reasons for inconsistent system performance, and examines how corpus properties influence bias and evaluation.

Findings

01

Corpora vary significantly in overlap and conflict levels.

02

System performance is inconsistent across different datasets.

03

Corpus properties influence bias propagation in summarization systems.

Abstract

Multi-document summarization (MDS) is the task of reflecting key points from any set of documents into a concise text paragraph. In the past, it has been used to aggregate news, tweets, product reviews, etc. from various sources. Owing to no standard definition of the task, we encounter a plethora of datasets with varying levels of overlap and conflict between participating documents. There is also no standard regarding what constitutes summary information in MDS. Adding to the challenge is the fact that new systems report results on a set of chosen datasets, which might not correlate with their performance on the other datasets. In this paper, we study this heterogeneous task with the help of a few widely used MDS corpora and a suite of state-of-the-art models. We make an attempt to quantify the quality of summarization corpus and prescribe a list of points to consider while proposing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

LCS2-IIITD/summarization_bias
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.