Judging with Many Minds: Do More Perspectives Mean Less Prejudice? On Bias Amplifications and Resistance in Multi-Agent Based LLM-as-Judge

Chiyu Ma; Enpei Zhang; Yilun Zhao; Wenjun Liu; Yaning Jia; Peijun Qing; Lin Shi; Arman Cohan; Yujun Yan; Soroush Vosoughi

arXiv:2505.19477·cs.AI·September 19, 2025

Judging with Many Minds: Do More Perspectives Mean Less Prejudice? On Bias Amplifications and Resistance in Multi-Agent Based LLM-as-Judge

Chiyu Ma, Enpei Zhang, Yilun Zhao, Wenjun Liu, Yaning Jia, Peijun Qing, Lin Shi, Arman Cohan, Yujun Yan, Soroush Vosoughi

PDF

Open Access

TL;DR

This paper systematically analyzes how different biases manifest and amplify in multi-agent LLM-as-Judge systems, comparing debate and meta-judging frameworks, and evaluates the effectiveness of a debiasing method in these contexts.

Contribution

It provides the first comprehensive analysis of bias amplification and resistance in multi-agent LLM evaluation frameworks, and assesses a debiasing method's effectiveness within these systems.

Findings

01

Debate amplifies biases sharply after initial rounds.

02

Meta-judge approaches show greater bias resistance.

03

Debiasing with PINE reduces biases in debate but less so in meta-judging.

Abstract

LLM-as-Judge has emerged as a scalable alternative to human evaluation, enabling large language models (LLMs) to provide reward signals in trainings. While recent work has explored multi-agent extensions such as multi-agent debate and meta-judging to enhance evaluation quality, the question of how intrinsic biases manifest in these settings remains underexplored. In this study, we conduct a systematic analysis of four diverse bias types: position bias, verbosity bias, chain-of-thought bias, and bandwagon bias. We evaluate these biases across two widely adopted multi-agent LLM-as-Judge frameworks: Multi-Agent-Debate and LLM-as-Meta-Judge. Our results show that debate framework amplifies biases sharply after the initial debate, and this increased bias is sustained in subsequent rounds, while meta-judge approaches exhibit greater resistance. We further investigate the incorporation of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLaw, Economics, and Judicial Systems · Artificial Intelligence in Law · Dispute Resolution and Class Actions