Can GPT models Follow Human Summarization Guidelines? A Study for Targeted Communication Goals

Yongxin Zhou; Fabien Ringeval; Fran\c{c}ois Portet

arXiv:2310.16810·cs.CL·October 7, 2025·1 cites

Can GPT models Follow Human Summarization Guidelines? A Study for Targeted Communication Goals

Yongxin Zhou, Fabien Ringeval, Fran\c{c}ois Portet

PDF

Open Access

TL;DR

This paper evaluates GPT models' ability to generate dialogue summaries that follow human guidelines, comparing their performance to task-specific models across multiple datasets and assessment methods.

Contribution

It demonstrates GPT models' effectiveness in adhering to human summarization guidelines and highlights challenges in automatic evaluation metrics.

Findings

01

GPT summaries are preferred over task-specific models and references.

02

GPT models follow guidelines but sometimes produce longer outputs.

03

Discrepancies exist between automatic metrics and human judgments.

Abstract

This study investigates the ability of GPT models (ChatGPT, GPT-4 and GPT-4o) to generate dialogue summaries that adhere to human guidelines. Our evaluation involved experimenting with various prompts to guide the models in complying with guidelines on two datasets: DialogSum (English social conversations) and DECODA (French call center interactions). Human evaluation, based on summarization guidelines, served as the primary assessment method, complemented by extensive quantitative and qualitative analyses. Our findings reveal a preference for GPT-generated summaries over those from task-specific pre-trained models and reference summaries, highlighting GPT models' ability to follow human guidelines despite occasionally producing longer outputs and exhibiting divergent lexical and structural alignment with references. The discrepancy between ROUGE, BERTScore, and human evaluation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsMulti-Head Attention · Attention Is All You Need · Cosine Annealing · Linear Warmup With Cosine Annealing · Linear Layer · Adam · Attention Dropout · Position-Wise Feed-Forward Layer · Discriminative Fine-Tuning · Refunds@Expedia|||How do I get a full refund from Expedia?