A Dual-Perspective NLG Meta-Evaluation Framework with Automatic Benchmark and Better Interpretability

Xinyu Hu; Mingqi Gao; Li Lin; Zhenghan Yu; Xiaojun Wan

arXiv:2502.12052·cs.CL·August 18, 2025

A Dual-Perspective NLG Meta-Evaluation Framework with Automatic Benchmark and Better Interpretability

Xinyu Hu, Mingqi Gao, Li Lin, Zhenghan Yu, Xiaojun Wan

PDF

Open Access

TL;DR

This paper introduces a dual-perspective NLG meta-evaluation framework that enhances interpretability and automates benchmark creation, enabling more effective assessment of evaluation metrics without additional human annotations.

Contribution

It proposes a novel dual-perspective framework for NLG meta-evaluation and an automatic benchmark construction method, addressing limitations of traditional approaches.

Findings

01

Improved interpretability of evaluation metrics.

02

Effective automatic benchmark generation.

03

Comprehensive analysis of 16 LLM evaluators.

Abstract

In NLG meta-evaluation, evaluation metrics are typically assessed based on their consistency with humans. However, we identify some limitations in traditional NLG meta-evaluation approaches, such as issues in handling human ratings and ambiguous selections of correlation measures, which undermine the effectiveness of meta-evaluation. In this work, we propose a dual-perspective NLG meta-evaluation framework that focuses on different evaluation capabilities, thereby providing better interpretability. In addition, we introduce a method of automatically constructing the corresponding benchmarks without requiring new human annotations. Furthermore, we conduct experiments with 16 representative LLMs as the evaluators based on our proposed framework, comprehensively analyzing their evaluation performance from different perspectives.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Intelligent Tutoring Systems and Adaptive Learning · Model-Driven Software Engineering Techniques