Using Similarity to Evaluate Factual Consistency in Summaries
Yuxuan Ye, Edwin Simpson, Raul Santos Rodriguez

TL;DR
This paper introduces SBERTScore, a zero-shot similarity-based metric for evaluating factual consistency in summaries, outperforming existing metrics and effectively identifying correct summaries without fine-tuning.
Contribution
The paper proposes SBERTScore, a novel zero-shot similarity metric that improves factuality evaluation by comparing sentences between summaries and source documents, eliminating the need for fine-tuning.
Findings
SBERTScore outperforms BERTScore and similar metrics in factuality detection.
Combining multiple evaluation techniques enhances error detection.
SBERTScore is particularly effective in identifying accurate summaries.
Abstract
Cutting-edge abstractive summarisers generate fluent summaries, but the factuality of the generated text is not guaranteed. Early summary factuality evaluation metrics are usually based on n-gram overlap and embedding similarity, but are reported fail to align with human annotations. Therefore, many techniques for detecting factual inconsistencies build pipelines around natural language inference (NLI) or question-answering (QA) models with additional supervised learning steps. In this paper, we revisit similarity-based metrics, showing that this failure stems from the comparison text selection and its granularity. We propose a new zero-shot factuality evaluation metric, Sentence-BERT Score (SBERTScore), which compares sentences between the summary and the source document. It outperforms widely-used word-word metrics including BERTScore and can compete with existing NLI and QA-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Text Analysis Techniques
MethodsALIGN
