DecompEval: Evaluating Generated Texts as Unsupervised Decomposed Question Answering
Pei Ke, Fei Huang, Fei Mi, Yasheng Wang, Qun Liu, Xiaoyan Zhu, Minlie, Huang

TL;DR
DecompEval is a novel, unsupervised evaluation metric for natural language generation that uses instruction-tuned language models to assess text quality through decomposed, interpretable subquestions, achieving state-of-the-art results.
Contribution
Proposes DecompEval, an unsupervised, instruction-based evaluation method that enhances generalization and interpretability without training on evaluation datasets.
Findings
Achieves state-of-the-art performance in text summarization and dialogue generation evaluation.
Demonstrates strong generalization across different tasks and evaluation dimensions.
Provides interpretable evidence through subquestion decomposition.
Abstract
Existing evaluation metrics for natural language generation (NLG) tasks face the challenges on generalization ability and interpretability. Specifically, most of the well-performed metrics are required to train on evaluation datasets of specific NLG tasks and evaluation dimensions, which may cause over-fitting to task-specific datasets. Furthermore, existing metrics only provide an evaluation score for each dimension without revealing the evidence to interpret how this score is obtained. To deal with these challenges, we propose a simple yet effective metric called DecompEval. This metric formulates NLG evaluation as an instruction-style question answering task and utilizes instruction-tuned pre-trained language models (PLMs) without training on evaluation datasets, aiming to enhance the generalization ability. To make the evaluation process more interpretable, we decompose our devised…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
