Auto-PRE: An Automatic and Cost-Efficient Peer-Review Framework for Language Generation Evaluation

Junjie Chen; Weihang Su; Zhumin Chu; Haitao Li; Yujia Zhou; Dingbo Yuan; Xudong Wang; Jun Zhou; Yiqun Liu; Min Zhang; Shaoping Ma; Qingyao Ai

arXiv:2410.12265·cs.CL·November 11, 2025

Auto-PRE: An Automatic and Cost-Efficient Peer-Review Framework for Language Generation Evaluation

Junjie Chen, Weihang Su, Zhumin Chu, Haitao Li, Yujia Zhou, Dingbo Yuan, Xudong Wang, Jun Zhou, Yiqun Liu, Min Zhang, Shaoping Ma, Qingyao Ai

PDF

Open Access 1 Video

TL;DR

Auto-PRE is an innovative automatic evaluation framework for language models that mimics peer review, reducing costs and biases while maintaining state-of-the-art performance across multiple tasks.

Contribution

It introduces a novel LLM evaluation method that automatically selects evaluators based on key traits, improving efficiency and scalability over traditional human-based assessments.

Findings

01

Auto-PRE achieves state-of-the-art results on summarization, QA, and dialogue tasks.

02

The framework significantly reduces evaluation costs.

03

It provides a scalable approach for automating LLM evaluation.

Abstract

The rapid development of large language models (LLMs) has highlighted the need for efficient and reliable methods to evaluate their performance. Traditional evaluation methods often face challenges like high costs, limited task formats, dependence on human references, and systematic biases. To address these limitations, we propose Auto-PRE, an automatic LLM evaluation framework inspired by the peer review process. Unlike previous approaches that rely on human annotations, Auto-PRE automatically selects evaluator LLMs based on three core traits: consistency, pertinence, and self-confidence, which correspond to the instruction, content, and response stages, respectively, and collectively cover the entire evaluation process. Experiments on three representative tasks, including summarization, non-factoid QA, and dialogue generation, demonstrate that Auto-PRE achieves state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Auto-PRE: An Automatic and Cost-Efficient Peer-Review Framework for Language Generation Evaluation· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling