Exploring the features used for summary evaluation by Human and GPT

Zahra Sadeghi; Evangelos Milios; Frank Rudzicz

arXiv:2512.19620·cs.CL·December 23, 2025

Exploring the features used for summary evaluation by Human and GPT

Zahra Sadeghi, Evangelos Milios, Frank Rudzicz

PDF

Open Access

TL;DR

This paper investigates the features used by humans and GPTs for summary evaluation, analyzing their alignment and improving GPT judgments by incorporating human-like metrics.

Contribution

It identifies features aligned with human and GPT responses and demonstrates that instructing GPTs with human metrics enhances their evaluation accuracy.

Findings

01

Features aligned with human and GPT responses identified

02

Instructing GPTs with human metrics improves their evaluation consistency

03

Mapping between evaluation scores and metrics is better understood

Abstract

Summary assessment involves evaluating how well a generated summary reflects the key ideas and meaning of the source text, requiring a deep understanding of the content. Large Language Models (LLMs) have been used to automate this process, acting as judges to evaluate summaries with respect to the original text. While previous research investigated the alignment between LLMs and Human responses, it is not yet well understood what properties or features are exploited by them when asked to evaluate based on a particular quality dimension, and there has not been much attention towards mapping between evaluation scores and metrics. In this paper, we address this issue and discover features aligned with Human and Generative Pre-trained Transformers (GPTs) responses by studying statistical and machine learning metrics. Furthermore, we show that instructing GPTs to employ metrics used by Human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Sentiment Analysis and Opinion Mining · Computational and Text Analysis Methods