Mitigating the Bias of Large Language Model Evaluation
Hongli Zhou, Hui Huang, Yunfei Long, Bing Xu, Conghui Zhu, Hailong, Cao, Muyun Yang, Tiejun Zhao

TL;DR
This paper investigates and mitigates the bias in Large Language Model-based evaluation methods, proposing calibration and contrastive training techniques to improve fairness without sacrificing accuracy.
Contribution
It introduces systematic bias mitigation strategies for LLM-as-a-Judge, addressing superficial quality bias in both closed-source and open-source models.
Findings
Bias is significantly reduced by calibration and contrastive training.
Evaluation accuracy is maintained despite bias mitigation.
Methods outperform baseline approaches in bias reduction.
Abstract
Recently, there has been a trend of evaluating the Large Language Model (LLM) quality in the flavor of LLM-as-a-Judge, namely leveraging another LLM to evaluate the current output quality. However, existing judges are proven to be biased, namely they would favor answers which present better superficial quality (such as verbosity, fluency) while ignoring the instruction following ability. In this work, we propose systematic research about the bias of LLM-as-a-Judge. Specifically, for closed-source judge models, we apply calibration to mitigate the significance of superficial quality, both on probability level and prompt level. For open-source judge models, we propose to mitigate the bias by contrastive training, with curated negative samples that deviate from instruction but present better superficial quality. We apply our methods on the bias evaluation benchmark, and experiment results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
