Play the Shannon Game With Language Models: A Human-Free Approach to Summary Evaluation
Nicholas Egan, Oleg Vasilyev, John Bohannon

TL;DR
This paper introduces reference-free summary evaluation metrics using pretrained language models, inspired by the Shannon Game, to assess summary quality without human annotators, showing strong correlation with human judgments.
Contribution
It presents a novel, human-free evaluation method for summaries based on language models, extending the Shannon Game concept and improving correlation with human assessments.
Findings
Metrics achieve state-of-the-art correlation with human judgments on coherence and relevance.
Metrics show competitive correlation with human judgments of consistency and fluency.
Using transformer models, the approach is empirically validated across multiple summary quality dimensions.
Abstract
The goal of a summary is to concisely state the most important information in a document. With this principle in mind, we introduce new reference-free summary evaluation metrics that use a pretrained language model to estimate the information content shared between a document and its summary. These metrics are a modern take on the Shannon Game, a method for summary quality scoring proposed decades ago, where we replace human annotators with language models. We also view these metrics as an extension of BLANC, a recently proposed approach to summary quality measurement based on the performance of a language model with and without the help of a summary. Using transformer based language models, we empirically verify that our metrics achieve state-of-the-art correlation with human judgement of the summary quality dimensions of both coherence and relevance, as well as competitive correlation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
MethodsLinear Layer · BLANC · Cosine Annealing · Weight Decay · Dropout · Linear Warmup With Cosine Annealing · Multi-Head Attention · Dense Connections · Softmax · Discriminative Fine-Tuning
