PiCO: Peer Review in LLMs based on the Consistency Optimization
Kun-Peng Ning, Shuo Yang, Yu-Yang Liu, Jia-Yu Yao, Zhen-Hui Liu,, Yong-Hong Tian, Yibing Song, Li Yuan

TL;DR
This paper introduces PiCO, an unsupervised peer-review based evaluation method for LLMs that assesses model capabilities through mutual evaluation, formalized as a constrained optimization problem to rank models by their true ability.
Contribution
The paper proposes a novel unsupervised peer-review framework for LLM evaluation, utilizing consistency optimization and learnable capability parameters to rank models without human annotations.
Findings
PiCO effectively ranks LLMs by capability using mutual evaluation.
The proposed metrics PEN, CIN, and LIS accurately measure alignment with human rankings.
Experiments validate PiCO's ability to distinguish model performance hierarchies.
Abstract
Existing large language models (LLMs) evaluation methods typically focus on testing the performance on some closed-environment and domain-specific benchmarks with human annotations. In this paper, we explore a novel unsupervised evaluation direction, utilizing peer-review mechanisms to measure LLMs automatically. In this setting, both open-source and closed-source LLMs lie in the same environment, capable of answering unlabeled questions and evaluating each other, where each LLM's response score is jointly determined by other anonymous ones. To obtain the ability hierarchy among these models, we assign each LLM a learnable capability parameter to adjust the final ranking. We formalize it as a constrained optimization problem, intending to maximize the consistency of each LLM's capabilities and scores. The key assumption behind is that high-level LLM can evaluate others' answers more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies
MethodsFocus
