FactSim: Fact-Checking for Opinion Summarization
Leandro Anghinoni, Jorge Sanchez

TL;DR
This paper introduces FactSim, an automated method for evaluating the factual consistency of opinion summaries, addressing limitations of existing metrics especially with large language models, and demonstrating high correlation with human judgments.
Contribution
The paper presents a novel automated metric for factual assessment in opinion summarization that improves correlation with human evaluations over existing methods.
Findings
FactSim accurately measures claim similarity regardless of negation or paraphrasing.
The proposed metric shows high correlation with human judgments.
FactSim outperforms state-of-the-art metrics in factual consistency evaluation.
Abstract
We explore the need for more comprehensive and precise evaluation techniques for generative artificial intelligence (GenAI) in text summarization tasks, specifically in the area of opinion summarization. Traditional methods, which leverage automated metrics to compare machine-generated summaries from a collection of opinion pieces, e.g. product reviews, have shown limitations due to the paradigm shift introduced by large language models (LLM). This paper addresses these shortcomings by proposing a novel, fully automated methodology for assessing the factual consistency of such summaries. The method is based on measuring the similarity between the claims in a given summary with those from the original reviews, measuring the coverage and consistency of the generated summary. To do so, we rely on a simple approach to extract factual assessment from texts that we then compare and summarize…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Sentiment Analysis and Opinion Mining · Misinformation and Its Impacts
