Bias Fitting to Mitigate Length Bias of Reward Model in RLHF

Kangwen Zhao; Jianfeng Cai; Jinhua Zhu; Ruopei Sun; Dongyun Xue; Wengang Zhou; Li Li; Houqiang Li

arXiv:2505.12843·cs.LG·May 20, 2025

Bias Fitting to Mitigate Length Bias of Reward Model in RLHF

Kangwen Zhao, Jianfeng Cai, Jinhua Zhu, Ruopei Sun, Dongyun Xue, Wengang Zhou, Li Li, Houqiang Li

PDF

Open Access

TL;DR

This paper introduces FiMi-RM, a framework that learns and corrects length bias in reward models used in RLHF, leading to more balanced responses and improved alignment without verbosity.

Contribution

FiMi-RM is the first method to explicitly model and mitigate non-linear length bias in reward models for RLHF, enhancing alignment quality.

Findings

01

Reduces verbosity in language model responses.

02

Improves length-controlled win rate.

03

Balances length-reward distribution.

Abstract

Reinforcement Learning from Human Feedback relies on reward models to align large language models with human preferences. However, RLHF often suffers from reward hacking, wherein policy learning exploits flaws in the trained reward model to maximize reward scores without genuinely aligning with human preferences. A significant example of such reward hacking is length bias, where reward models usually favor longer responses irrespective of actual response quality. Previous works on length bias have notable limitations, these approaches either mitigate bias without characterizing the bias form, or simply assume a linear length-reward relation. To accurately model the intricate nature of length bias and facilitate more effective bias mitigation, we propose FiMi-RM (Bias Fitting to Mitigate Length Bias of Reward Model in RLHF), a framework that autonomously learns and corrects underlying…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI · Mobile Crowdsensing and Crowdsourcing

MethodsALIGN