RepEval: Effective Text Evaluation with LLM Representation
Shuqian Sheng, Yi Xu, Tianhang Zhang, Zanwei Shen, Luoyi Fu, Jiaxin, Ding, Lei Zhou, Xiaoying Gan, Xinbing Wang, Chenghu Zhou

TL;DR
RepEval introduces a novel LLM representation-based evaluation metric that effectively assesses text quality across diverse scenarios with minimal samples, outperforming previous methods in correlation with human judgments.
Contribution
The paper proposes RepEval, a new metric leveraging LLM representations and direction vectors, enabling adaptable and low-cost text evaluation across multiple tasks.
Findings
RepEval achieves higher correlation with human judgments than previous metrics.
The method performs well across fourteen datasets and two evaluation tasks.
RepEval requires only minimal sample pairs for direction vector construction.
Abstract
The era of Large Language Models (LLMs) raises new demands for automatic evaluation metrics, which should be adaptable to various application scenarios while maintaining low cost and effectiveness. Traditional metrics for automatic text evaluation are often tailored to specific scenarios, while LLM-based evaluation metrics are costly, requiring fine-tuning or rely heavily on the generation capabilities of LLMs. Besides, previous LLM-based metrics ignore the fact that, within the space of LLM representations, there exist direction vectors that indicate the estimation of text quality. To this end, we introduce RepEval, a metric that leverages the projection of LLM representations for evaluation. Through simple prompt modifications, RepEval can easily transition to various tasks, requiring only minimal sample pairs for direction vector construction. Results on fourteen datasets across two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques
MethodsAttention Is All You Need · Dropout · Softmax · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings · Linear Layer · Dense Connections · Label Smoothing · Residual Connection
