DecompEval: Evaluating Generated Texts as Unsupervised Decomposed   Question Answering

Pei Ke; Fei Huang; Fei Mi; Yasheng Wang; Qun Liu; Xiaoyan Zhu; Minlie; Huang

arXiv:2307.06869·cs.CL·July 14, 2023

DecompEval: Evaluating Generated Texts as Unsupervised Decomposed Question Answering

Pei Ke, Fei Huang, Fei Mi, Yasheng Wang, Qun Liu, Xiaoyan Zhu, Minlie, Huang

PDF

Open Access 1 Repo

TL;DR

DecompEval is a novel, unsupervised evaluation metric for natural language generation that uses instruction-tuned language models to assess text quality through decomposed, interpretable subquestions, achieving state-of-the-art results.

Contribution

Proposes DecompEval, an unsupervised, instruction-based evaluation method that enhances generalization and interpretability without training on evaluation datasets.

Findings

01

Achieves state-of-the-art performance in text summarization and dialogue generation evaluation.

02

Demonstrates strong generalization across different tasks and evaluation dimensions.

03

Provides interpretable evidence through subquestion decomposition.

Abstract

Existing evaluation metrics for natural language generation (NLG) tasks face the challenges on generalization ability and interpretability. Specifically, most of the well-performed metrics are required to train on evaluation datasets of specific NLG tasks and evaluation dimensions, which may cause over-fitting to task-specific datasets. Furthermore, existing metrics only provide an evaluation score for each dimension without revealing the evidence to interpret how this score is obtained. To deal with these challenges, we propose a simple yet effective metric called DecompEval. This metric formulates NLG evaluation as an instruction-style question answering task and utilizes instruction-tuned pre-trained language models (PLMs) without training on evaluation datasets, aiming to enhance the generalization ability. To make the evaluation process more interpretable, we decompose our devised…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kepei1106/decompeval
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications