Segmenting Text and Learning Their Rewards for Improved RLHF in Language   Model

Yueqin Yin; Shentao Yang; Yujia Xie; Ziyi Yang; Yuting Sun; Hany; Awadalla; Weizhu Chen; and Mingyuan Zhou

arXiv:2501.02790·cs.CL·January 7, 2025

Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model

Yueqin Yin, Shentao Yang, Yujia Xie, Ziyi Yang, Yuting Sun, Hany, Awadalla, Weizhu Chen, and Mingyuan Zhou

PDF

Open Access 2 Repos 10 Models

TL;DR

This paper introduces a segment-level reward model for RLHF in language models, improving reward assignment by considering semantically complete text segments, leading to better alignment with human preferences.

Contribution

It proposes a novel segment-level reward learning approach that allows dynamic segmentation and dense reward interpolation, enhancing RLHF effectiveness.

Findings

01

Achieves competitive performance on three RLHF benchmarks.

02

Demonstrates the effectiveness of segment-based rewards through ablation studies.

03

Provides a scalable method compatible with standard preference datasets.

Abstract

Reinforcement learning from human feedback (RLHF) has been widely adopted to align language models (LMs) with human preference. Prior RLHF works typically take a bandit formulation, which, though intuitive, ignores the sequential nature of LM generation and can suffer from the sparse reward issue. While recent works propose dense token-level RLHF, treating each token as an action may be oversubtle to proper reward assignment. In this paper, we seek to get the best of both by training and utilizing a segment-level reward model, which assigns a reward to each semantically complete text segment that spans over a short sequence of tokens. For reward learning, our method allows dynamic text segmentation and compatibility with standard sequence-preference datasets. For effective RL-based LM training against segment reward, we generalize the classical scalar bandit reward normalizers into…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsALIGN