Neural Text Summarization: A Critical Evaluation

Wojciech Kry\'sci\'nski; Nitish Shirish Keskar; Bryan McCann; Caiming; Xiong; Richard Socher

arXiv:1908.08960·cs.CL·August 27, 2019

Neural Text Summarization: A Critical Evaluation

Wojciech Kry\'sci\'nski, Nitish Shirish Keskar, Bryan McCann, Caiming, Xiong, Richard Socher

PDF

TL;DR

This paper critically examines the current state of neural text summarization, highlighting issues with datasets, evaluation metrics, and model biases that hinder genuine progress and reliable assessment.

Contribution

It provides a comprehensive critique of existing datasets, evaluation protocols, and model behaviors, emphasizing the need for improved benchmarks and evaluation methods.

Findings

01

Datasets are often noisy and underconstrained.

02

Evaluation metrics poorly correlate with human judgment.

03

Models tend to overfit layout biases and lack diversity.

Abstract

Text summarization aims at compressing long documents into a shorter form that conveys the most important parts of the original document. Despite increased interest in the community and notable research effort, progress on benchmark datasets has stagnated. We critically evaluate key ingredients of the current research setup: datasets, evaluation metrics, and models, and highlight three primary shortcomings: 1) automatically collected datasets leave the task underconstrained and may contain noise detrimental to training and evaluation, 2) current evaluation protocol is weakly correlated with human judgment and does not account for important characteristics such as factual correctness, 3) models overfit to layout biases of current datasets and offer limited diversity in their outputs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.