Mitigating the Impact of Reference Quality on Evaluation of Summarization Systems with Reference-Free Metrics
Th\'eo Gigant (L2S), Camille Guinaudeau (STL, LISN), Marc Decombas,, Fr\'ed\'eric Dufaux (L2S)

TL;DR
This paper introduces a new reference-free evaluation metric for summarization that correlates well with human judgments and enhances the robustness of existing reference-based metrics, especially when reference quality is poor.
Contribution
The authors propose a novel reference-free metric that is computationally inexpensive and improves the evaluation of summarization systems by reducing dependence on reference quality.
Findings
The new metric correlates strongly with human relevance judgments.
It enhances the robustness of reference-based metrics in low-quality reference scenarios.
The metric is computationally cheap to compute.
Abstract
Automatic metrics are used as proxies to evaluate abstractive summarization systems when human annotations are too expensive. To be useful, these metrics should be fine-grained, show a high correlation with human annotations, and ideally be independent of reference quality; however, most standard evaluation metrics for summarization are reference-based, and existing reference-free metrics correlate poorly with relevance, especially on summaries of longer documents. In this paper, we introduce a reference-free metric that correlates well with human evaluated relevance, while being very cheap to compute. We show that this metric can also be used alongside reference-based metrics to improve their robustness in low quality reference settings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
