EVA-Score: Evaluating Abstractive Long-form Summarization on Informativeness through Extraction and Validation
Yuchen Fan, Yazhe Wan, Xin Zhong, Haonan Cheng, Ning Ding, Bowen Zhou

TL;DR
EVA-Score is a new evaluation metric for abstractive long-form summarization that measures informativeness by extracting and validating information content, showing higher correlation with human judgments than existing metrics.
Contribution
The paper introduces EVA-Score, a novel informativeness-based evaluation metric for long-form summaries, addressing limitations of similarity-based and LLM-based metrics.
Findings
EVA-Score correlates highly with human judgments.
LLMs still lag behind humans in informativeness.
EVA-Score effectively evaluates long-form summarization quality.
Abstract
Since LLMs emerged, more attention has been paid to abstractive long-form summarization, where longer input sequences indicate more information contained. Nevertheless, the automatic evaluation of such summaries remains underexplored. The current evaluation metrics for long-form summarization either use similarity-based metrics like ROUGE and BERTScore or LLM-based metrics using appropriate prompts or pre-defined schema. We argue that the former only relies on similarity and fails to consider informativeness while the latter lacks quantitative analysis of informative richness, and is rather subjective and hard to explain. Current evaluation metrics either use traditional metrics like ROUGE and BERTScore, which rely on surface-level similarity and fail to consider informativeness, or simple LLM-based metrics, which are not robust and easily overwhelmed by the long contexts. In this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTechnology Assessment and Management · Human-Automation Interaction and Safety
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Softmax · Residual Connection · Byte Pair Encoding · Layer Normalization · Label Smoothing · Adam · Dropout
