Evaluating Scoring Bias in LLM-as-a-Judge

Qingquan Li; Shaoyu Dou; Kailai Shao; Chao Chen; Haixiang Hu

arXiv:2506.22316·cs.CL·May 22, 2026

Evaluating Scoring Bias in LLM-as-a-Judge

Qingquan Li, Shaoyu Dou, Kailai Shao, Chao Chen, Haixiang Hu

PDF

TL;DR

This paper investigates scoring bias in LLM-based evaluation systems, identifying new types of biases, proposing a framework to measure them, and demonstrating their impact on model judgments.

Contribution

It introduces the first formal analysis of scoring bias in LLM judges, defining new bias types, and providing a comprehensive evaluation framework and empirical evidence.

Findings

01

Advanced LLMs exhibit significant scoring biases.

02

The proposed metrics effectively quantify different bias types.

03

Insights enable improved prompt design to reduce biases.

Abstract

The "LLM-as-a-Judge" paradigm, using Large Language Models (LLMs) as automated evaluators, is pivotal to LLM development, offering scalable feedback for complex tasks. However, the reliability of these judges is compromised by various biases. Existing research has heavily concentrated on biases in comparative evaluations. In contrast, scoring-based evaluations-which assign an absolute score and are often more practical in industrial applications-remain under-investigated. To address this gap, we undertake the first dedicated examination of scoring bias in LLM judges. We shift the focus from biases tied to the evaluation targets to those originating from the scoring prompt itself. We formally define scoring bias and identify three novel, previously unstudied types: rubric order bias, score ID bias, and reference answer score bias. We propose a comprehensive framework to quantify these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law · Topic Modeling · Computational and Text Analysis Methods