FENICE: Factuality Evaluation of summarization based on Natural language   Inference and Claim Extraction

Alessandro Scir\`e; Karim Ghonim; Roberto Navigli

arXiv:2403.02270·cs.CL·September 4, 2024·2 cites

FENICE: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction

Alessandro Scir\`e, Karim Ghonim, Roberto Navigli

PDF

Open Access 1 Repo 1 Models 1 Video

TL;DR

FENICE is a new, interpretable, and efficient factuality metric for summarization that uses natural language inference and claim extraction to improve factual consistency evaluation, outperforming existing benchmarks especially on long summaries.

Contribution

The paper introduces FENICE, a novel factuality evaluation metric combining NLI and claim extraction, addressing interpretability and efficiency issues of prior methods.

Findings

01

FENICE achieves state-of-the-art results on AGGREFACT benchmark.

02

It effectively evaluates factuality in long-form summaries.

03

The method is more interpretable and computationally practical.

Abstract

Recent advancements in text summarization, particularly with the advent of Large Language Models (LLMs), have shown remarkable performance. However, a notable challenge persists as a substantial number of automatically-generated summaries exhibit factual inconsistencies, such as hallucinations. In response to this issue, various approaches for the evaluation of consistency for summarization have emerged. Yet, these newly-introduced metrics face several limitations, including lack of interpretability, focus on short document summaries (e.g., news articles), and computational impracticality, especially for LLM-based metrics. To address these shortcomings, we propose Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction (FENICE), a more interpretable and efficient factuality-oriented metric. FENICE leverages an NLI-based alignment between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Babelscape/FENICE
noneOfficial

Models

🤗
Babelscape/t5-base-summarization-claim-extractor
model· 5.5k dl· ♡ 13
5.5k dl♡ 13

Videos

FENICE: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques

MethodsSparse Evolutionary Training · Focus