Quantifying and Mitigating Self-Preference Bias of LLM Judges
Jinming Yang, Zheng Hu, Chuxian Qiu, Zhenyu Deng, Xinshan Jiao, Tao Zhou

TL;DR
This paper introduces an automated framework to measure and reduce Self-Preference Bias in LLM-based evaluations, improving trustworthiness and scalability of automated judging systems.
Contribution
It proposes a novel, fully automated method to quantify and mitigate Self-Preference Bias without human annotations, enhancing large-scale evaluation reliability.
Findings
Advanced LLMs often exhibit high or negative correlation with low SPB.
The proposed mitigation strategy reduces SPB by 31.5% on average.
Empirical analysis across 20 mainstream LLMs demonstrates the effectiveness of the approach.
Abstract
LLM-as-a-Judge has become a dominant approach in automated evaluation systems, playing critical roles in model alignment, leaderboard construction, quality control, and so on. However, the scalability and trustworthiness of this approach can be substantially distorted by Self-Preference Bias (SPB), which is a directional evaluative deviation in which LLMs systematically favor or disfavor their own generated outputs during evaluation. Existing measurements rely on costly human annotations and conflate generative capability with evaluative stance, and thus are impractical for large-scale deployment in real-world systems. To address this issue, we introduce a fully automated framework to quantifying and mitigating SPB, which constructs equal-quality pairs of responses with negligible quality differences, enabling statistical disentanglement of discriminability from bias propensity without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
