Asking and Answering Questions to Evaluate the Factual Consistency of Summaries
Alex Wang, Kyunghyun Cho, and Mike Lewis

TL;DR
QAGS is an automatic evaluation protocol that detects factual inconsistencies in summaries by asking questions about the source and summary, showing higher correlation with human judgments and providing interpretability.
Contribution
The paper introduces QAGS, a novel question-answering based metric for factual consistency in summarization, outperforming existing metrics in correlation with human judgments.
Findings
QAGS correlates better with human judgments than other metrics.
QAGS provides interpretability by highlighting inconsistent tokens.
QAGS effectively identifies factual errors in summarization datasets.
Abstract
Practical applications of abstractive summarization models are limited by frequent factual inconsistencies with respect to their input. Existing automatic evaluation metrics for summarization are largely insensitive to such errors. We propose an automatic evaluation protocol called QAGS (pronounced "kags") that is designed to identify factual inconsistencies in a generated summary. QAGS is based on the intuition that if we ask questions about a summary and its source, we will receive similar answers if the summary is factually consistent with the source. To evaluate QAGS, we collect human judgments of factual consistency on model-generated summaries for the CNN/DailyMail (Hermann et al., 2015) and XSUM (Narayan et al., 2018) summarization datasets. QAGS has substantially higher correlations with these judgments than other automatic evaluation metrics. Also, QAGS offers a natural form of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
