BatchEval: Towards Human-like Text Evaluation

Peiwen Yuan; Shaoxiong Feng; Yiwei Li; Xinglin Wang; Boyuan Pan; Heda; Wang; Kan Li

arXiv:2401.00437·cs.CL·January 2, 2024·1 cites

BatchEval: Towards Human-like Text Evaluation

Peiwen Yuan, Shaoxiong Feng, Yiwei Li, Xinglin Wang, Boyuan Pan, Heda, Wang, Kan Li

PDF

Open Access 1 Repo

TL;DR

BatchEval introduces a batch-wise evaluation paradigm for automatic text assessment using large language models, significantly improving robustness, consistency, and correlation with human judgment over traditional sample-wise methods.

Contribution

It proposes a novel batch-wise evaluation framework that addresses prompt sensitivity and noise issues, with an optimal two-stage procedure and heterogeneous batch strategy.

Findings

01

Outperforms state-of-the-art methods by 10.5% in Pearson correlation.

02

Achieves comparable performance with only 64% API cost.

03

Demonstrates robustness and generalization across multiple tasks and models.

Abstract

Significant progress has been made in automatic text evaluation with the introduction of large language models (LLMs) as evaluators. However, current sample-wise evaluation paradigm suffers from the following issues: (1) Sensitive to prompt design; (2) Poor resistance to noise; (3) Inferior ensemble performance with static reference. Inspired by the fact that humans treat both criterion definition and inter sample comparison as references for evaluation, we propose BatchEval, a paradigm that conducts batch-wise evaluation iteratively to alleviate the above problems. We explore variants under this paradigm and confirm the optimal settings are two stage procedure with heterogeneous batch composition strategy and decimal scoring format. Comprehensive experiments across 3 LLMs on 4 text evaluation tasks demonstrate that BatchEval outperforms state-of-the-art methods by 10.5% on Pearson…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ypw0102/batcheval
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods