TL;DR
This paper investigates the challenges of ensuring factual accuracy in abstractive summarization, highlighting the importance of fine-grained annotations and proposing models to detect and improve factuality.
Contribution
It introduces a detailed analysis of factual errors at multiple levels and demonstrates that human-labeled data enhances factuality detection over synthetic data.
Findings
Factual errors vary across datasets and synthetic errors do not reflect real errors.
Human-labeled fine-grained annotations outperform sentence-level or synthetic data.
Factuality detection models can improve training data quality for more accurate summarization.
Abstract
Recent pre-trained abstractive summarization systems have started to achieve credible performance, but a major barrier to their use in practice is their propensity to output summaries that are not faithful to the input and that contain factual errors. While a number of annotated datasets and statistical models for assessing factuality have been explored, there is no clear picture of what errors are most important to target or where current techniques are succeeding and failing. We explore both synthetic and human-labeled data sources for training models to identify factual errors in summarization, and study factuality at the word-, dependency-, and sentence-level. Our observations are threefold. First, exhibited factual errors differ significantly across datasets, and commonly-used training sets of simple synthetic errors do not reflect errors made on abstractive datasets like XSum.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
