Crowd Comparative Reasoning: Unlocking Comprehensive Evaluations for   LLM-as-a-Judge

Qiyuan Zhang; Yufei Wang; Yuxin Jiang; Liangyou Li; Chuhan Wu; Yasheng; Wang; Xin Jiang; Lifeng Shang; Ruiming Tang; Fuyuan Lyu; Chen Ma

arXiv:2502.12501·cs.CL·April 8, 2025

Crowd Comparative Reasoning: Unlocking Comprehensive Evaluations for LLM-as-a-Judge

Qiyuan Zhang, Yufei Wang, Yuxin Jiang, Liangyou Li, Chuhan Wu, Yasheng, Wang, Xin Jiang, Lifeng Shang, Ruiming Tang, Fuyuan Lyu, Chen Ma

PDF

Open Access

TL;DR

This paper introduces a crowd-based evaluation method for LLM-as-a-Judge that improves the depth and reliability of judgments by incorporating additional crowd responses, leading to more accurate and comprehensive assessments.

Contribution

It proposes a novel crowd-based comparative evaluation approach that enhances the depth, accuracy, and reliability of LLM judgments beyond existing majority voting methods.

Findings

01

Achieves an average accuracy gain of 6.7% across five benchmarks.

02

Produces higher-quality, more comprehensive chain-of-thoughts.

03

Improves evaluation accuracy as inference scales.

Abstract

LLM-as-a-Judge, which generates chain-of-thought (CoT) judgments, has become a widely adopted auto-evaluation method. However, its reliability is compromised by the CoT reasoning's inability to capture comprehensive and deeper details, often leading to incomplete outcomes. Existing methods mainly rely on majority voting or criteria expansion, which is insufficient to address the limitation in CoT. We propose Crowd-based Comparative Evaluation, which introduces additional crowd responses to compare with the candidate responses, thereby exposing deeper and more comprehensive details within the candidate responses. This process effectively guides LLM-as-a-Judge to provide a more detailed CoT judgment. Extensive experiments demonstrate that our approach enhances evaluation reliability, achieving an average accuracy gain of 6.7% across five benchmarks. Moreover, our method produces…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLaw, AI, and Intellectual Property · Artificial Intelligence in Law