Estimation of Summary-to-Text Inconsistency by Mismatched Embeddings
Oleg Vasilyev, John Bohannon

TL;DR
This paper introduces ESTIME, a new reference-free measure for evaluating summary faithfulness by detecting minute inconsistencies with source documents, showing strong correlation with expert scores and sensitivity to subtle errors.
Contribution
The paper presents ESTIME, a novel embedding-based metric for assessing summary-to-text consistency, outperforming existing measures in detecting subtle factual errors.
Findings
ESTIME correlates strongly with expert scores on SummEval.
ESTIME is more sensitive to subtle factual errors than existing metrics.
The method effectively detects minute inconsistencies in summaries.
Abstract
We propose a new reference-free summary quality evaluation measure, with emphasis on the faithfulness. The measure is designed to find and count all possible minute inconsistencies of the summary with respect to the source document. The proposed ESTIME, Estimator of Summary-to-Text Inconsistency by Mismatched Embeddings, correlates with expert scores in summary-level SummEval dataset stronger than other common evaluation measures not only in Consistency but also in Fluency. We also introduce a method of generating subtle factual errors in human summaries. We show that ESTIME is more sensitive to subtle errors than other common evaluation measures.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
