Judging the Judges: A Systematic Study of Position Bias in LLM-as-a-Judge

Lin Shi; Chiyu Ma; Wenhua Liang; Xingjian Diao; Weicheng Ma; Soroush Vosoughi

arXiv:2406.07791·cs.CL·November 12, 2025·5 cites

Judging the Judges: A Systematic Study of Position Bias in LLM-as-a-Judge

Lin Shi, Chiyu Ma, Wenhua Liang, Xingjian Diao, Weicheng Ma, Soroush Vosoughi

PDF

Open Access 1 Repo

TL;DR

This study systematically investigates position bias in LLM-based judges across various tasks, revealing factors influencing bias and its impact on evaluation reliability, with implications for improving automated judging systems.

Contribution

It introduces three metrics to measure position bias and provides a comprehensive analysis of bias sources and variability across judges and tasks in LLM-based evaluation.

Findings

01

Position bias is statistically significant and varies across judges and tasks.

02

Bias is weakly affected by prompt length but strongly by solution quality gaps.

03

Analysis suggests potential dataset modifications to mitigate bias effects.

Abstract

LLM-as-a-Judge has emerged as a promising alternative to human evaluators across various tasks, yet inherent biases - particularly position bias, the tendency to favor solutions based on their position within the prompt - compromise its reliability. This exploratory study evaluates position bias in LLM judges across pairwise and list-wise comparison settings, introducing three metrics: repetition stability, position consistency, and preference fairness. Our experiments, involving 15 LLM judges across MTBench and DevBench with 22 tasks and approximately 40 solution-generating models, result in over 150,000 evaluation instances. We identify Judge-Level, Candidate-Level, and Task-Level factors contributing to bias. The findings confirm that position bias is not due to random chance and varies significantly across judges and tasks. While position bias is weakly influenced by the length of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Slimshilin/Position-Bias-Analyzer-Demo
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLegal Education and Practice Innovations · Law, Economics, and Judicial Systems

MethodsAttention Is All You Need · Residual Connection · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Adam · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer