Finding a Balanced Degree of Automation for Summary Evaluation
Shiyue Zhang, Mohit Bansal

TL;DR
This paper introduces a flexible semi-automatic to fully automatic evaluation framework for summarization that balances human judgment reliability with automation efficiency, improving correlation with human assessments.
Contribution
It proposes the Lite2Pyramid, Lite3Pyramid, and Lite2.xPyramid metrics, combining human-labeled units with automated semantic analysis for more reliable and cost-effective summary evaluation.
Findings
Lite2Pyramid achieves the best summary-level correlations.
Lite3Pyramid performs comparably to existing automatic metrics.
Lite2.xPyramid offers a trade-off between correlation and manual effort.
Abstract
Human evaluation for summarization tasks is reliable but brings in issues of reproducibility and high costs. Automatic metrics are cheap and reproducible but sometimes poorly correlated with human judgment. In this work, we propose flexible semiautomatic to automatic summary evaluation metrics, following the Pyramid human evaluation method. Semi-automatic Lite2Pyramid retains the reusable human-labeled Summary Content Units (SCUs) for reference(s) but replaces the manual work of judging SCUs' presence in system summaries with a natural language inference (NLI) model. Fully automatic Lite3Pyramid further substitutes SCUs with automatically extracted Semantic Triplet Units (STUs) via a semantic role labeling (SRL) model. Finally, we propose in-between metrics, Lite2.xPyramid, where we use a simple regressor to predict how well the STUs can simulate SCUs and retain SCUs that are more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
