Are LLM Evaluators Really Narcissists? Sanity Checking Self-Preference Evaluations
Dani Roytburg, Matthew Bozoukov, Matthew Nguyen, Jou Barzdukas, Mackenzie Puig-Hall, Narmeen Oozeer

TL;DR
This paper investigates the self-preference bias in LLM evaluators, identifies a core confound affecting measurements, and introduces a baseline to improve the accuracy of bias detection in automated evaluations.
Contribution
It uncovers a key methodological confound in measuring LLM self-preference bias and proposes a baseline to decouple true bias from noisy responses, improving evaluation reliability.
Findings
Only 51% of initial bias findings remain significant after correction.
A core confound can reduce measurement error by 89.6%.
The baseline helps isolate genuine self-preference signals.
Abstract
Recent research has shown that large language models (LLMs) favor their own outputs when acting as judges, undermining the integrity of automated post-training and evaluation workflows. However, it is difficult to disentangle which evaluation biases are explained by narcissism versus general experimental confounds, distorting measurements of self-preference bias. We discover a core methodological confound which could reduce measurement error by 89.6%. Specifically, LLM evaluators may deliver self-preferring verdicts when the judge responds to queries which they completed incorrectly themselves; this would be true regardless of whether one of their responses is their own. To decouple self-preference signals from noisy outputs on hard problems, we introduce an Evaluator Quality Baseline, which compares the probability that a judge incorrectly votes for itself against the probability that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Ethics and Social Impacts of AI · Mobile Crowdsensing and Crowdsourcing
