Neural Text Summarization: A Critical Evaluation
Wojciech Kry\'sci\'nski, Nitish Shirish Keskar, Bryan McCann, Caiming, Xiong, Richard Socher

TL;DR
This paper critically examines the current state of neural text summarization, highlighting issues with datasets, evaluation metrics, and model biases that hinder genuine progress and reliable assessment.
Contribution
It provides a comprehensive critique of existing datasets, evaluation protocols, and model behaviors, emphasizing the need for improved benchmarks and evaluation methods.
Findings
Datasets are often noisy and underconstrained.
Evaluation metrics poorly correlate with human judgment.
Models tend to overfit layout biases and lack diversity.
Abstract
Text summarization aims at compressing long documents into a shorter form that conveys the most important parts of the original document. Despite increased interest in the community and notable research effort, progress on benchmark datasets has stagnated. We critically evaluate key ingredients of the current research setup: datasets, evaluation metrics, and models, and highlight three primary shortcomings: 1) automatically collected datasets leave the task underconstrained and may contain noise detrimental to training and evaluation, 2) current evaluation protocol is weakly correlated with human judgment and does not account for important characteristics such as factual correctness, 3) models overfit to layout biases of current datasets and offer limited diversity in their outputs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
