PiCO: Peer Review in LLMs based on the Consistency Optimization

Kun-Peng Ning; Shuo Yang; Yu-Yang Liu; Jia-Yu Yao; Zhen-Hui Liu,; Yong-Hong Tian; Yibing Song; Li Yuan

arXiv:2402.01830·cs.CL·February 24, 2025·2 cites

PiCO: Peer Review in LLMs based on the Consistency Optimization

Kun-Peng Ning, Shuo Yang, Yu-Yang Liu, Jia-Yu Yao, Zhen-Hui Liu,, Yong-Hong Tian, Yibing Song, Li Yuan

PDF

Open Access 1 Repo

TL;DR

This paper introduces PiCO, an unsupervised peer-review based evaluation method for LLMs that assesses model capabilities through mutual evaluation, formalized as a constrained optimization problem to rank models by their true ability.

Contribution

The paper proposes a novel unsupervised peer-review framework for LLM evaluation, utilizing consistency optimization and learnable capability parameters to rank models without human annotations.

Findings

01

PiCO effectively ranks LLMs by capability using mutual evaluation.

02

The proposed metrics PEN, CIN, and LIS accurately measure alignment with human rankings.

03

Experiments validate PiCO's ability to distinguish model performance hierarchies.

Abstract

Existing large language models (LLMs) evaluation methods typically focus on testing the performance on some closed-environment and domain-specific benchmarks with human annotations. In this paper, we explore a novel unsupervised evaluation direction, utilizing peer-review mechanisms to measure LLMs automatically. In this setting, both open-source and closed-source LLMs lie in the same environment, capable of answering unlabeled questions and evaluating each other, where each LLM's response score is jointly determined by other anonymous ones. To obtain the ability hierarchy among these models, we assign each LLM a learnable capability parameter to adjust the final ranking. We formalize it as a constrained optimization problem, intending to maximize the consistency of each LLM's capabilities and scores. The key assumption behind is that high-level LLM can evaluate others' answers more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

PKU-YuanGroup/Peer-review-in-LLMs
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies

MethodsFocus