Mitigating the Bias of Large Language Model Evaluation

Hongli Zhou; Hui Huang; Yunfei Long; Bing Xu; Conghui Zhu; Hailong; Cao; Muyun Yang; Tiejun Zhao

arXiv:2409.16788·cs.CL·September 26, 2024

Mitigating the Bias of Large Language Model Evaluation

Hongli Zhou, Hui Huang, Yunfei Long, Bing Xu, Conghui Zhu, Hailong, Cao, Muyun Yang, Tiejun Zhao

PDF

Open Access 1 Repo

TL;DR

This paper investigates and mitigates the bias in Large Language Model-based evaluation methods, proposing calibration and contrastive training techniques to improve fairness without sacrificing accuracy.

Contribution

It introduces systematic bias mitigation strategies for LLM-as-a-Judge, addressing superficial quality bias in both closed-source and open-source models.

Findings

01

Bias is significantly reduced by calibration and contrastive training.

02

Evaluation accuracy is maintained despite bias mitigation.

03

Methods outperform baseline approaches in bias reduction.

Abstract

Recently, there has been a trend of evaluating the Large Language Model (LLM) quality in the flavor of LLM-as-a-Judge, namely leveraging another LLM to evaluate the current output quality. However, existing judges are proven to be biased, namely they would favor answers which present better superficial quality (such as verbosity, fluency) while ignoring the instruction following ability. In this work, we propose systematic research about the bias of LLM-as-a-Judge. Specifically, for closed-source judge models, we apply calibration to mitigate the significance of superficial quality, both on probability level and prompt level. For open-source judge models, we propose to mitigate the bias by contrastive training, with curated negative samples that deviate from instruction but present better superficial quality. We apply our methods on the bias evaluation benchmark, and experiment results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Joe-Hall-Lee/Debias
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques