Evaluating Text Summaries Generated by Large Language Models Using   OpenAI's GPT

Hassan Shakil; Atqiya Munawara Mahi; Phuoc Nguyen; Zeydy Ortiz; Mamoun; T. Mardini

arXiv:2405.04053·cs.CL·May 8, 2024·2 cites

Evaluating Text Summaries Generated by Large Language Models Using OpenAI's GPT

Hassan Shakil, Atqiya Munawara Mahi, Phuoc Nguyen, Zeydy Ortiz, Mamoun, T. Mardini

PDF

Open Access

TL;DR

This study explores using OpenAI's GPT models as independent evaluators of text summaries, comparing its assessments with traditional metrics to determine its effectiveness in measuring summary quality.

Contribution

It introduces GPT as a novel, independent evaluator for text summaries, providing a new approach that complements traditional evaluation metrics.

Findings

01

GPT evaluations correlate well with traditional metrics

02

GPT effectively assesses relevance and coherence

03

Potential for GPT to improve summary evaluation processes

Abstract

This research examines the effectiveness of OpenAI's GPT models as independent evaluators of text summaries generated by six transformer-based models from Hugging Face: DistilBART, BERT, ProphetNet, T5, BART, and PEGASUS. We evaluated these summaries based on essential properties of high-quality summary - conciseness, relevance, coherence, and readability - using traditional metrics such as ROUGE and Latent Semantic Analysis (LSA). Uniquely, we also employed GPT not as a summarizer but as an evaluator, allowing it to independently assess summary quality without predefined metrics. Our analysis revealed significant correlations between GPT evaluations and traditional metrics, particularly in assessing relevance and coherence. The results demonstrate GPT's potential as a robust tool for evaluating text summaries, offering insights that complement established metrics and providing a basis…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Weight Decay · Cosine Annealing · Gated Linear Unit · Attention Dropout · Adafactor · Dropout · Linear Warmup With Cosine Annealing · Residual Connection