FineSurE: Fine-grained Summarization Evaluation using LLMs

Hwanjun Song; Hang Su; Igor Shalyminov; Jason Cai; Saab Mansour

arXiv:2407.00908·cs.CL·July 23, 2024

FineSurE: Fine-grained Summarization Evaluation using LLMs

Hwanjun Song, Hang Su, Igor Shalyminov, Jason Cai, Saab Mansour

PDF

Open Access 2 Repos 1 Video

TL;DR

FineSurE introduces a multi-dimensional, fine-grained evaluation method for text summarization using LLMs, addressing limitations of existing metrics by assessing faithfulness, completeness, and conciseness at the sentence level.

Contribution

It presents a novel LLM-based evaluator that provides detailed, multi-dimensional assessment of summaries, improving upon existing summary-level metrics.

Findings

01

Outperforms state-of-the-art methods on completeness and conciseness.

02

Enables sentence-level hallucination detection.

03

Demonstrates versatility across various LLM backbones.

Abstract

Automated evaluation is crucial for streamlining text summarization benchmarking and model development, given the costly and time-consuming nature of human evaluation. Traditional methods like ROUGE do not correlate well with human judgment, while recently proposed LLM-based metrics provide only summary-level assessment using Likert-scale scores. This limits deeper model analysis, e.g., we can only assign one hallucination score at the summary level, while at the sentence level, we can count sentences containing hallucinations. To remedy those limitations, we propose FineSurE, a fine-grained evaluator specifically tailored for the summarization task using large language models (LLMs). It also employs completeness and conciseness criteria, in addition to faithfulness, enabling multi-dimensional assessment. We compare various open-source and proprietary LLMs as backbones for FineSurE. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

FineSurE: Fine-grained Summarization Evaluation using LLMs· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management