Mitigating the Impact of Reference Quality on Evaluation of   Summarization Systems with Reference-Free Metrics

Th\'eo Gigant (L2S); Camille Guinaudeau (STL; LISN); Marc Decombas,; Fr\'ed\'eric Dufaux (L2S)

arXiv:2410.10867·cs.CL·October 16, 2024

Mitigating the Impact of Reference Quality on Evaluation of Summarization Systems with Reference-Free Metrics

Th\'eo Gigant (L2S), Camille Guinaudeau (STL, LISN), Marc Decombas,, Fr\'ed\'eric Dufaux (L2S)

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a new reference-free evaluation metric for summarization that correlates well with human judgments and enhances the robustness of existing reference-based metrics, especially when reference quality is poor.

Contribution

The authors propose a novel reference-free metric that is computationally inexpensive and improves the evaluation of summarization systems by reducing dependence on reference quality.

Findings

01

The new metric correlates strongly with human relevance judgments.

02

It enhances the robustness of reference-based metrics in low-quality reference scenarios.

03

The metric is computationally cheap to compute.

Abstract

Automatic metrics are used as proxies to evaluate abstractive summarization systems when human annotations are too expensive. To be useful, these metrics should be fine-grained, show a high correlation with human annotations, and ideally be independent of reference quality; however, most standard evaluation metrics for summarization are reference-based, and existing reference-free metrics correlate poorly with relevance, especially on summaries of longer documents. In this paper, we introduce a reference-free metric that correlates well with human evaluated relevance, while being very cheap to compute. We show that this metric can also be used alongside reference-based metrics to improve their robustness in low quality reference settings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

giganttheo/importance-based-relevance-score
noneOfficial

Videos

Mitigating the Impact of Reference Quality on Evaluation of Summarization Systems with Reference-Free Metrics· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies