Annotating and Modeling Fine-grained Factuality in Summarization

Tanya Goyal; Greg Durrett

arXiv:2104.04302·cs.CL·April 12, 2021

Annotating and Modeling Fine-grained Factuality in Summarization

Tanya Goyal, Greg Durrett

PDF

3 Repos 1 Models

TL;DR

This paper investigates the challenges of ensuring factual accuracy in abstractive summarization, highlighting the importance of fine-grained annotations and proposing models to detect and improve factuality.

Contribution

It introduces a detailed analysis of factual errors at multiple levels and demonstrates that human-labeled data enhances factuality detection over synthetic data.

Findings

01

Factual errors vary across datasets and synthetic errors do not reflect real errors.

02

Human-labeled fine-grained annotations outperform sentence-level or synthetic data.

03

Factuality detection models can improve training data quality for more accurate summarization.

Abstract

Recent pre-trained abstractive summarization systems have started to achieve credible performance, but a major barrier to their use in practice is their propensity to output summaries that are not faithful to the input and that contain factual errors. While a number of annotated datasets and statistical models for assessing factuality have been explored, there is no clear picture of what errors are most important to target or where current techniques are succeeding and failing. We explore both synthetic and human-labeled data sources for training models to identify factual errors in summarization, and study factuality at the word-, dependency-, and sentence-level. Our observations are threefold. First, exhibited factual errors differ significantly across datasets, and commonly-used training sets of simple synthetic errors do not reflect errors made on abstractive datasets like XSum.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
aiautomationlab/german-news-title-gen-mt5
model· 38 dl· ♡ 4
38 dl♡ 4

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.