Loose lips sink ships: Mitigating Length Bias in Reinforcement Learning   from Human Feedback

Wei Shen; Rui Zheng; Wenyu Zhan; Jun Zhao; Shihan Dou; Tao Gui; Qi; Zhang; Xuanjing Huang

arXiv:2310.05199·cs.CL·November 30, 2023

Loose lips sink ships: Mitigating Length Bias in Reinforcement Learning from Human Feedback

Wei Shen, Rui Zheng, Wenyu Zhan, Jun Zhao, Shihan Dou, Tao Gui, Qi, Zhang, Xuanjing Huang

PDF

Open Access

TL;DR

This paper addresses length bias in reinforcement learning from human feedback, proposing a Product-of-Experts approach to improve reward models by separating length bias from human intent understanding, leading to better language model performance.

Contribution

The paper introduces a novel Product-of-Experts framework that isolates length bias in reward modeling, enhancing alignment of language models with human preferences.

Findings

01

Length bias often causes reward models to favor longer responses.

02

The proposed PoE method effectively separates length bias from human intent.

03

Experimental results show improved language model performance regardless of sequence length.

Abstract

Reinforcement learning from human feedback serves as a crucial bridge, aligning large language models with human and societal values. This alignment requires a vast corpus of human feedback to learn a reward model, which is subsequently used to finetune language models. However, we have identified that the reward model often finds shortcuts to bypass its intended objectives, misleadingly assuming that humans prefer longer responses. The emergence of length bias often induces the model to favor longer outputs, yet it doesn't equate to an increase in helpful information within these outputs. In this paper, we propose an innovative solution, applying the Product-of-Experts (PoE) technique to separate reward modeling from the influence of sequence length. In our framework, the main expert concentrates on understanding human intents, while the biased expert targets the identification and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Software Engineering Research · Explainable Artificial Intelligence (XAI)