RepEval: Effective Text Evaluation with LLM Representation

Shuqian Sheng; Yi Xu; Tianhang Zhang; Zanwei Shen; Luoyi Fu; Jiaxin; Ding; Lei Zhou; Xiaoying Gan; Xinbing Wang; Chenghu Zhou

arXiv:2404.19563·cs.CL·October 29, 2024

RepEval: Effective Text Evaluation with LLM Representation

Shuqian Sheng, Yi Xu, Tianhang Zhang, Zanwei Shen, Luoyi Fu, Jiaxin, Ding, Lei Zhou, Xiaoying Gan, Xinbing Wang, Chenghu Zhou

PDF

Open Access 1 Repo 1 Video

TL;DR

RepEval introduces a novel LLM representation-based evaluation metric that effectively assesses text quality across diverse scenarios with minimal samples, outperforming previous methods in correlation with human judgments.

Contribution

The paper proposes RepEval, a new metric leveraging LLM representations and direction vectors, enabling adaptable and low-cost text evaluation across multiple tasks.

Findings

01

RepEval achieves higher correlation with human judgments than previous metrics.

02

The method performs well across fourteen datasets and two evaluation tasks.

03

RepEval requires only minimal sample pairs for direction vector construction.

Abstract

The era of Large Language Models (LLMs) raises new demands for automatic evaluation metrics, which should be adaptable to various application scenarios while maintaining low cost and effectiveness. Traditional metrics for automatic text evaluation are often tailored to specific scenarios, while LLM-based evaluation metrics are costly, requiring fine-tuning or rely heavily on the generation capabilities of LLMs. Besides, previous LLM-based metrics ignore the fact that, within the space of LLM representations, there exist direction vectors that indicate the estimation of text quality. To this end, we introduce RepEval, a metric that leverages the projection of LLM representations for evaluation. Through simple prompt modifications, RepEval can easily transition to various tasks, requiring only minimal sample pairs for direction vector construction. Results on fourteen datasets across two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

susisheng/repeval
pytorchOfficial

Videos

RepEval: Effective Text Evaluation with LLM Representation· underline

Taxonomy

TopicsNatural Language Processing Techniques

MethodsAttention Is All You Need · Dropout · Softmax · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings · Linear Layer · Dense Connections · Label Smoothing · Residual Connection