Auto-PRE: An Automatic and Cost-Efficient Peer-Review Framework for Language Generation Evaluation
Junjie Chen, Weihang Su, Zhumin Chu, Haitao Li, Yujia Zhou, Dingbo Yuan, Xudong Wang, Jun Zhou, Yiqun Liu, Min Zhang, Shaoping Ma, Qingyao Ai

TL;DR
Auto-PRE is an innovative automatic evaluation framework for language models that mimics peer review, reducing costs and biases while maintaining state-of-the-art performance across multiple tasks.
Contribution
It introduces a novel LLM evaluation method that automatically selects evaluators based on key traits, improving efficiency and scalability over traditional human-based assessments.
Findings
Auto-PRE achieves state-of-the-art results on summarization, QA, and dialogue tasks.
The framework significantly reduces evaluation costs.
It provides a scalable approach for automating LLM evaluation.
Abstract
The rapid development of large language models (LLMs) has highlighted the need for efficient and reliable methods to evaluate their performance. Traditional evaluation methods often face challenges like high costs, limited task formats, dependence on human references, and systematic biases. To address these limitations, we propose Auto-PRE, an automatic LLM evaluation framework inspired by the peer review process. Unlike previous approaches that rely on human annotations, Auto-PRE automatically selects evaluator LLMs based on three core traits: consistency, pertinence, and self-confidence, which correspond to the instruction, content, and response stages, respectively, and collectively cover the entire evaluation process. Experiments on three representative tasks, including summarization, non-factoid QA, and dialogue generation, demonstrate that Auto-PRE achieves state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
