DEE: Dual-stage Explainable Evaluation Method for Text Generation
Shenyu Zhang, Yu Li, Rui Wu, Xiutian Huang, Yongrui Chen, Wenhao Xu,, Guilin Qi

TL;DR
DEE is a dual-stage, explainable evaluation method for text generation that efficiently identifies errors and provides detailed diagnostics, improving over existing methods especially in industrial contexts.
Contribution
The paper introduces DEE, a novel dual-stage evaluation framework built on Llama 2, with a new dataset AntEval, enabling comprehensive and explainable assessment of generated texts.
Findings
DEE outperforms existing evaluation methods in human correlation.
DEE achieves higher efficiency in error detection.
The AntEval dataset covers new issues like hallucination and toxicity.
Abstract
Automatic methods for evaluating machine-generated texts hold significant importance due to the expanding applications of generative systems. Conventional methods tend to grapple with a lack of explainability, issuing a solitary numerical score to signify the assessment outcome. Recent advancements have sought to mitigate this limitation by incorporating large language models (LLMs) to offer more detailed error analyses, yet their applicability remains constrained, particularly in industrial contexts where comprehensive error coverage and swift detection are paramount. To alleviate these challenges, we introduce DEE, a Dual-stage Explainable Evaluation method for estimating the quality of text generation. Built upon Llama 2, DEE follows a dual-stage principle guided by stage-specific instructions to perform efficient identification of errors in generated texts in the initial stage and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques
MethodsLLaMA
