Towards Provably Unbiased LLM Judges via Bias-Bounded Evaluation

Benjamin Feuer; Lucas Rosenblatt; Oussama Elachqar

arXiv:2603.05485·cs.AI·March 6, 2026

Towards Provably Unbiased LLM Judges via Bias-Bounded Evaluation

Benjamin Feuer, Lucas Rosenblatt, Oussama Elachqar

PDF

Open Access

TL;DR

This paper introduces a formal framework called average bias-boundedness (A-BB) that guarantees reduced harm from bias in LLM-based judging systems, ensuring more reliable and fair AI evaluations.

Contribution

The paper proposes the A-BB framework to enforce bias reduction guarantees in LLM judges, addressing a key challenge in autonomous AI feedback systems.

Findings

01

Achieved bias-bounded guarantees with high correlation to original rankings.

02

Retained 61-99% ranking correlation across various bias settings.

03

Most judge-bias combinations exceeded 80% correlation.

Abstract

As AI models progress beyond simple chatbots into more complex workflows, we draw ever closer to the event horizon beyond which AI systems will be utilized in autonomous, self-maintaining feedback loops. Any autonomous AI system will depend on automated, verifiable rewards and feedback; in settings where ground truth is sparse or non-deterministic, one practical source of such rewards is an LLM-as-a-Judge. Although LLM judges continue to improve, the literature has yet to introduce systems capable of enforcing standards with strong guarantees, particularly when bias vectors are unknown or adversarially discovered. To remedy this issue, we propose average bias-boundedness (A-BB), an algorithmic framework which formally guarantees reductions of harm/impact as a result of any measurable bias in an LLM judge. Evaluating on Arena-Hard-Auto with four LLM judges, we achieve (tau=0.5,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI