SummEval: Re-evaluating Summarization Evaluation

Alexander R. Fabbri; Wojciech Kry\'sci\'nski; Bryan McCann; Caiming; Xiong; Richard Socher; Dragomir Radev

arXiv:2007.12626·cs.CL·February 3, 2021

SummEval: Re-evaluating Summarization Evaluation

Alexander R. Fabbri, Wojciech Kry\'sci\'nski, Bryan McCann, Caiming, Xiong, Richard Socher, Dragomir Radev

PDF

5 Repos 4 Datasets

TL;DR

This paper critically re-evaluates 14 automatic summarization metrics, benchmarks 23 models, and provides a comprehensive dataset and toolkit to improve evaluation practices and align them more closely with human judgments.

Contribution

It offers a thorough re-assessment of evaluation metrics, a large dataset of model summaries and human judgments, and an extensible toolkit to advance summarization evaluation research.

Findings

01

14 metrics re-evaluated comprehensively

02

Benchmarking of 23 recent models conducted

03

Largest collection of human judgments assembled

Abstract

The scarcity of comprehensive up-to-date studies on evaluation metrics for text summarization and the lack of consensus regarding evaluation protocols continue to inhibit progress. We address the existing shortcomings of summarization evaluation methods along five dimensions: 1) we re-evaluate 14 automatic evaluation metrics in a comprehensive and consistent fashion using neural summarization model outputs along with expert and crowd-sourced human annotations, 2) we consistently benchmark 23 recent summarization models using the aforementioned automatic evaluation metrics, 3) we assemble the largest collection of summaries generated by models trained on the CNN/DailyMail news dataset and share it in a unified format, 4) we implement and share a toolkit that provides an extensible and unified API for evaluating summarization models across a broad range of automatic metrics, 5) we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.