Learning Evaluation Models from Large Language Models for Sequence Generation
Chenglong Wang, Hang Zhou, Kaiyan Chang, Tongran Liu, Chunliang Zhang, Quan Du, Tong Xiao, Yue Zhang, and Jingbo Zhu

TL;DR
This paper introduces CSEM, a novel method for training sequence evaluation models using large language models to generate labeled data, eliminating the need for human annotations and improving evaluation accuracy across various scenarios.
Contribution
The paper proposes CSEM, a three-stage training approach that leverages large language models for data generation, supporting diverse evaluation types and enhancing sequence quality assessment.
Findings
CSEM effectively trains evaluation models without human-labeled data.
Metrics developed via CSEM outperform traditional metrics in sequence quality.
CSEM improves evaluation accuracy in reinforcement learning and reranking tasks.
Abstract
Automatic evaluation of sequence generation, traditionally reliant on metrics like BLEU and ROUGE, often fails to capture the semantic accuracy of generated text sequences due to their emphasis on n-gram overlap. A promising solution to this problem is to develop model-based metrics, such as BLEURT and COMET. However, these approaches are typically hindered by the scarcity of labeled evaluation data, which is necessary to train the evaluation models. In this work, we build upon this challenge by proposing the Customized Sequence Evaluation Metric (CSEM), a three-stage evaluation model training method that utilizes large language models to generate labeled data for model-based metric development, thereby eliminating the need for human-labeled data. Additionally, we expand the scope of CSEM to support various evaluation types, including single-aspect, multi-aspect, reference-free, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
