Explaining Length Bias in LLM-Based Preference Evaluations

Zhengyu Hu; Linxin Song; Jieyu Zhang; Zheyuan Xiao; Tianfu Wang; Zhengyu Chen; Nicholas Jing Yuan; Jianxun Lian; Kaize Ding; Hui Xiong

arXiv:2407.01085·cs.LG·September 5, 2025·1 cites

Explaining Length Bias in LLM-Based Preference Evaluations

Zhengyu Hu, Linxin Song, Jieyu Zhang, Zheyuan Xiao, Tianfu Wang, Zhengyu Chen, Nicholas Jing Yuan, Jianxun Lian, Kaize Ding, Hui Xiong

PDF

Open Access

TL;DR

This paper investigates the length bias in LLM-based preference evaluations, decomposes the evaluation metric into desirability and information mass, and proposes AdapAlpaca to correct for length effects ensuring fair content quality assessments.

Contribution

It introduces a decomposition of preference metrics into length-independent and length-dependent components and proposes AdapAlpaca to mitigate length bias in evaluations.

Findings

01

Length impacts evaluation through information mass.

02

Decomposition clarifies bias sources.

03

AdapAlpaca improves fairness in preference assessments.

Abstract

The use of large language models (LLMs) as judges, particularly in preference comparisons, has become widespread, but this reveals a notable bias towards longer responses, undermining the reliability of such evaluations. To better understand such bias, we propose to decompose the preference evaluation metric, specifically the win rate, into two key components: desirability and information mass, where the former is length-independent and related to trustworthiness such as correctness, toxicity, and consistency, and the latter is length-dependent and represents the amount of information in the response. We empirically demonstrated the decomposition through controlled experiments and found that response length impacts evaluations by influencing information mass. To derive a reliable evaluation metric that assesses content quality without being confounded by response length, we propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRough Sets and Fuzzy Logic · Data Management and Algorithms · Semantic Web and Ontologies

MethodsDirect Preference Optimization