A Critical Look at Meta-evaluating Summarisation Evaluation Metrics

Xiang Dai; Sarvnaz Karimi; Biaoyan Fang

arXiv:2409.19507·cs.CL·October 1, 2024

A Critical Look at Meta-evaluating Summarisation Evaluation Metrics

Xiang Dai, Sarvnaz Karimi, Biaoyan Fang

PDF

Open Access

TL;DR

This paper critically examines current practices in meta-evaluating summarisation metrics, highlighting dataset limitations and advocating for diverse benchmarks and user-centric evaluation approaches to improve robustness and relevance.

Contribution

It provides a comprehensive review of recent meta-evaluation practices, identifies gaps such as dataset diversity and focus on faithfulness, and calls for new benchmarks and user-centric evaluation methods.

Findings

01

Meta-evaluation mainly uses news datasets.

02

Shift towards evaluating summary faithfulness.

03

Need for diverse benchmarks and user-focused metrics.

Abstract

Effective summarisation evaluation metrics enable researchers and practitioners to compare different summarisation systems efficiently. Estimating the effectiveness of an automatic evaluation metric, termed meta-evaluation, is a critically important research question. In this position paper, we review recent meta-evaluation practices for summarisation evaluation metrics and find that (1) evaluation metrics are primarily meta-evaluated on datasets consisting of examples from news summarisation datasets, and (2) there has been a noticeable shift in research focus towards evaluating the faithfulness of generated summaries. We argue that the time is ripe to build more diverse benchmarks that enable the development of more robust evaluation metrics and analyze the generalization ability of existing evaluation metrics. In addition, we call for research focusing on user-centric quality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Text Analysis Techniques · Natural Language Processing Techniques

MethodsFocus