How Far are We from Robust Long Abstractive Summarization?

Huan Yee Koh; Jiaxin Ju; He Zhang; Ming Liu; Shirui Pan

arXiv:2210.16732·cs.CL·November 1, 2022

How Far are We from Robust Long Abstractive Summarization?

Huan Yee Koh, Jiaxin Ju, He Zhang, Ming Liu, Shirui Pan

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper evaluates the current state of long document abstractive summarization, highlighting the gap between relevance and factual accuracy, and proposes directions for developing better factuality metrics.

Contribution

It provides a detailed human-annotated dataset and analysis of models and metrics, revealing limitations of ROUGE and factuality measures, and suggests future research directions.

Findings

01

ROUGE effectively measures relevance but not factuality.

02

Current factuality metrics have significant limitations.

03

BARTScore shows promising results in factuality evaluation.

Abstract

Abstractive summarization has made tremendous progress in recent years. In this work, we perform fine-grained human annotations to evaluate long document abstractive summarization systems (i.e., models and metrics) with the aim of implementing them to generate reliable summaries. For long document abstractive models, we show that the constant strive for state-of-the-art ROUGE results can lead us to generate more relevant summaries but not factual ones. For long document evaluation metrics, human evaluation results show that ROUGE remains the best at evaluating the relevancy of a summary. It also reveals important limitations of factuality metrics in detecting different types of factual errors and the reasons behind the effectiveness of BARTScore. We then suggest promising directions in the endeavor of developing factual consistency metrics. Finally, we release our annotated long…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

huankoh/How-Far-are-We-from-Robust-Long-Abstractive-Summarization
noneOfficial

Datasets

gigant/robust_long_abstractive_human_annotation
dataset· 16 dl
16 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques