Evaluating Text Summaries Generated by Large Language Models Using OpenAI's GPT
Hassan Shakil, Atqiya Munawara Mahi, Phuoc Nguyen, Zeydy Ortiz, Mamoun, T. Mardini

TL;DR
This study explores using OpenAI's GPT models as independent evaluators of text summaries, comparing its assessments with traditional metrics to determine its effectiveness in measuring summary quality.
Contribution
It introduces GPT as a novel, independent evaluator for text summaries, providing a new approach that complements traditional evaluation metrics.
Findings
GPT evaluations correlate well with traditional metrics
GPT effectively assesses relevance and coherence
Potential for GPT to improve summary evaluation processes
Abstract
This research examines the effectiveness of OpenAI's GPT models as independent evaluators of text summaries generated by six transformer-based models from Hugging Face: DistilBART, BERT, ProphetNet, T5, BART, and PEGASUS. We evaluated these summaries based on essential properties of high-quality summary - conciseness, relevance, coherence, and readability - using traditional metrics such as ROUGE and Latent Semantic Analysis (LSA). Uniquely, we also employed GPT not as a summarizer but as an evaluator, allowing it to independently assess summary quality without predefined metrics. Our analysis revealed significant correlations between GPT evaluations and traditional metrics, particularly in assessing relevance and coherence. The results demonstrate GPT's potential as a robust tool for evaluating text summaries, offering insights that complement established metrics and providing a basis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Weight Decay · Cosine Annealing · Gated Linear Unit · Attention Dropout · Adafactor · Dropout · Linear Warmup With Cosine Annealing · Residual Connection
