Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation

Mingqi Gao; Xinyu Hu; Li Lin; Xiaojun Wan

arXiv:2410.16834·cs.CL·January 28, 2025

Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation

Mingqi Gao, Xinyu Hu, Li Lin, Xiaojun Wan

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper systematically analyzes 12 correlation measures for NLG meta-evaluation, revealing how different measures affect evaluation outcomes and proposing perspectives to assess their effectiveness.

Contribution

It provides a comprehensive comparison of correlation measures in NLG meta-evaluation and introduces three perspectives to evaluate their capabilities.

Findings

01

Pearson correlation with global grouping performs best in discriminative power and ranking consistency.

02

Kendall correlation measures are least sensitive to score granularity.

03

Different correlation measures significantly impact meta-evaluation results.

Abstract

The correlation between NLG automatic evaluation metrics and human evaluation is often regarded as a critical criterion for assessing the capability of an evaluation metric. However, different grouping methods and correlation coefficients result in various types of correlation measures used in meta-evaluation. In specific evaluation scenarios, prior work often directly follows conventional measure settings, but the characteristics and differences between these measures have not gotten sufficient attention. Therefore, this paper analyzes 12 common correlation measures using a large amount of real-world data from six widely-used NLG evaluation datasets and 32 evaluation metrics, revealing that different measures indeed impact the meta-evaluation results. Furthermore, we propose three perspectives that reflect the capability of meta-evaluation: discriminative power, ranking consistency,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kite99520/NLGCorrEval
noneOfficial

Videos

Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation· underline

Taxonomy

TopicsMeta-analysis and systematic reviews · Domain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI)