EVA-Score: Evaluating Abstractive Long-form Summarization on Informativeness through Extraction and Validation

Yuchen Fan; Yazhe Wan; Xin Zhong; Haonan Cheng; Ning Ding; Bowen Zhou

arXiv:2407.04969·cs.CL·January 30, 2026

EVA-Score: Evaluating Abstractive Long-form Summarization on Informativeness through Extraction and Validation

Yuchen Fan, Yazhe Wan, Xin Zhong, Haonan Cheng, Ning Ding, Bowen Zhou

PDF

Open Access

TL;DR

EVA-Score is a new evaluation metric for abstractive long-form summarization that measures informativeness by extracting and validating information content, showing higher correlation with human judgments than existing metrics.

Contribution

The paper introduces EVA-Score, a novel informativeness-based evaluation metric for long-form summaries, addressing limitations of similarity-based and LLM-based metrics.

Findings

01

EVA-Score correlates highly with human judgments.

02

LLMs still lag behind humans in informativeness.

03

EVA-Score effectively evaluates long-form summarization quality.

Abstract

Since LLMs emerged, more attention has been paid to abstractive long-form summarization, where longer input sequences indicate more information contained. Nevertheless, the automatic evaluation of such summaries remains underexplored. The current evaluation metrics for long-form summarization either use similarity-based metrics like ROUGE and BERTScore or LLM-based metrics using appropriate prompts or pre-defined schema. We argue that the former only relies on similarity and fails to consider informativeness while the latter lacks quantitative analysis of informative richness, and is rather subjective and hard to explain. Current evaluation metrics either use traditional metrics like ROUGE and BERTScore, which rely on surface-level similarity and fail to consider informativeness, or simple LLM-based metrics, which are not robust and easily overwhelmed by the long contexts. In this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTechnology Assessment and Management · Human-Automation Interaction and Safety

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Softmax · Residual Connection · Byte Pair Encoding · Layer Normalization · Label Smoothing · Adam · Dropout