Beyond the Surface: Measuring Self-Preference in LLM Judgments
Zhi-Yuan Chen, Hao Wang, Xinyu Zhang, Enrui Hu, Yankai Lin

TL;DR
This paper introduces the DBG score, a novel method for accurately measuring self-preference bias in large language models by using gold judgments to separate bias from response quality, enabling more reliable assessments.
Contribution
The paper proposes the DBG score, which isolates self-preference bias from response quality using gold judgments, and applies it to analyze factors affecting bias in LLMs.
Findings
DBG score effectively separates bias from response quality.
Self-preference bias varies with model size and reasoning ability.
Response style and post-training data influence bias levels.
Abstract
Recent studies show that large language models (LLMs) exhibit self-preference bias when serving as judges, meaning they tend to favor their own responses over those generated by other models. Existing methods typically measure this bias by calculating the difference between the scores a judge model assigns to its own responses and those it assigns to responses from other models. However, this approach conflates self-preference bias with response quality, as higher-quality responses from the judge model may also lead to positive score differences, even in the absence of bias. To address this issue, we introduce gold judgments as proxies for the actual quality of responses and propose the DBG score, which measures self-preference bias as the difference between the scores assigned by the judge model to its own responses and the corresponding gold judgments. Since gold judgments reflect…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsPrivate Equity and Venture Capital · Taxation and Legal Issues
