Can GPT models Follow Human Summarization Guidelines? A Study for Targeted Communication Goals
Yongxin Zhou, Fabien Ringeval, Fran\c{c}ois Portet

TL;DR
This paper evaluates GPT models' ability to generate dialogue summaries that follow human guidelines, comparing their performance to task-specific models across multiple datasets and assessment methods.
Contribution
It demonstrates GPT models' effectiveness in adhering to human summarization guidelines and highlights challenges in automatic evaluation metrics.
Findings
GPT summaries are preferred over task-specific models and references.
GPT models follow guidelines but sometimes produce longer outputs.
Discrepancies exist between automatic metrics and human judgments.
Abstract
This study investigates the ability of GPT models (ChatGPT, GPT-4 and GPT-4o) to generate dialogue summaries that adhere to human guidelines. Our evaluation involved experimenting with various prompts to guide the models in complying with guidelines on two datasets: DialogSum (English social conversations) and DECODA (French call center interactions). Human evaluation, based on summarization guidelines, served as the primary assessment method, complemented by extensive quantitative and qualitative analyses. Our findings reveal a preference for GPT-generated summaries over those from task-specific pre-trained models and reference summaries, highlighting GPT models' ability to follow human guidelines despite occasionally producing longer outputs and exhibiting divergent lexical and structural alignment with references. The discrepancy between ROUGE, BERTScore, and human evaluation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsMulti-Head Attention · Attention Is All You Need · Cosine Annealing · Linear Warmup With Cosine Annealing · Linear Layer · Adam · Attention Dropout · Position-Wise Feed-Forward Layer · Discriminative Fine-Tuning · Refunds@Expedia|||How do I get a full refund from Expedia?
