Loose lips sink ships: Mitigating Length Bias in Reinforcement Learning from Human Feedback
Wei Shen, Rui Zheng, Wenyu Zhan, Jun Zhao, Shihan Dou, Tao Gui, Qi, Zhang, Xuanjing Huang

TL;DR
This paper addresses length bias in reinforcement learning from human feedback, proposing a Product-of-Experts approach to improve reward models by separating length bias from human intent understanding, leading to better language model performance.
Contribution
The paper introduces a novel Product-of-Experts framework that isolates length bias in reward modeling, enhancing alignment of language models with human preferences.
Findings
Length bias often causes reward models to favor longer responses.
The proposed PoE method effectively separates length bias from human intent.
Experimental results show improved language model performance regardless of sequence length.
Abstract
Reinforcement learning from human feedback serves as a crucial bridge, aligning large language models with human and societal values. This alignment requires a vast corpus of human feedback to learn a reward model, which is subsequently used to finetune language models. However, we have identified that the reward model often finds shortcuts to bypass its intended objectives, misleadingly assuming that humans prefer longer responses. The emergence of length bias often induces the model to favor longer outputs, yet it doesn't equate to an increase in helpful information within these outputs. In this paper, we propose an innovative solution, applying the Product-of-Experts (PoE) technique to separate reward modeling from the influence of sequence length. In our framework, the main expert concentrates on understanding human intents, while the biased expert targets the identification and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Software Engineering Research · Explainable Artificial Intelligence (XAI)
