Self-Preference Bias in Rubric-Based Evaluation of Large Language Models
Jos\'e Pombal, Ricardo Rei, Andr\'e F. T. Martins

TL;DR
This paper investigates self-preference bias in rubric-based evaluation of large language models, revealing its persistence and impact on model scoring, even with objective criteria and ensemble methods.
Contribution
It is the first study to analyze self-preference bias in rubric-based LLM evaluation, highlighting its effects and mitigation strategies.
Findings
SPB can cause up to 50% bias in objective rubric settings.
Ensembling judges reduces but does not eliminate SPB.
SPB can skew model scores by up to 10 points in subjective evaluations.
Abstract
LLM-as-a-judge has become the de facto approach for evaluating LLM outputs. However, judges are known to exhibit self-preference bias (SPB): they tend to favor outputs produced by themselves or by models from their own family. This skews evaluations and, thus, hinders model development, especially in settings of recursive self-improvement. We present the first study of SPB in rubric-based evaluation, an increasingly popular benchmarking paradigm where judges issue binary verdicts on individual evaluation criteria, instead of assigning holistic scores or rankings. Using IFEval, a benchmark with programmatically verifiable rubrics, we show that SPB persists even when evaluation criteria are entirely objective: among rubrics where generators fail, judges can be up to 50\% more likely to incorrectly mark them as satisfied when the output is their own. We also find that, similarly to other…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
