Human-like Summarization Evaluation with ChatGPT

Mingqi Gao; Jie Ruan; Renliang Sun; Xunjian Yin; Shiping Yang; Xiaojun; Wan

arXiv:2304.02554·cs.CL·April 6, 2023·38 cites

Human-like Summarization Evaluation with ChatGPT

Mingqi Gao, Jie Ruan, Renliang Sun, Xunjian Yin, Shiping Yang, Xiaojun, Wan

PDF

Open Access 1 Repo

TL;DR

This paper investigates ChatGPT's capability to evaluate text summarization in a human-like manner, comparing it with traditional metrics and human judgment across multiple datasets.

Contribution

It demonstrates ChatGPT's effectiveness in human-like evaluation methods and its potential to outperform automatic metrics in summarization assessment.

Findings

01

ChatGPT performs well with Likert, pairwise, Pyramid, and factuality evaluations.

02

It outperforms automatic metrics on some datasets.

03

Prompt design influences evaluation performance.

Abstract

Evaluating text summarization is a challenging problem, and existing evaluation metrics are far from satisfactory. In this study, we explored ChatGPT's ability to perform human-like summarization evaluation using four human evaluation methods on five datasets. We found that ChatGPT was able to complete annotations relatively smoothly using Likert scale scoring, pairwise comparison, Pyramid, and binary factuality evaluation. Additionally, it outperformed commonly used automatic evaluation metrics on some datasets. Furthermore, we discussed the impact of different prompts, compared its performance with that of human evaluation, and analyzed the generated explanations and invalid responses.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

raymin0223/fast_robust_early_exit
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques