Fine-Grained Human Feedback Gives Better Rewards for Language Model   Training

Zeqiu Wu; Yushi Hu; Weijia Shi; Nouha Dziri; Alane Suhr; Prithviraj; Ammanabrolu; Noah A. Smith; Mari Ostendorf; Hannaneh Hajishirzi

arXiv:2306.01693·cs.CL·October 31, 2023·35 cites

Fine-Grained Human Feedback Gives Better Rewards for Language Model Training

Zeqiu Wu, Yushi Hu, Weijia Shi, Nouha Dziri, Alane Suhr, Prithviraj, Ammanabrolu, Noah A. Smith, Mari Ostendorf, Hannaneh Hajishirzi

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Fine-Grained RLHF, a novel framework that uses detailed human feedback at the segment level to improve language model training, addressing limitations of holistic feedback.

Contribution

It proposes a new fine-grained reward modeling approach that provides segment-level feedback and multiple feedback types, enhancing language model alignment and customization.

Findings

01

Improved detoxification and long-form QA performance.

02

Enhanced ability to customize LM behaviors.

03

Validated effectiveness through automatic and human evaluations.

Abstract

Language models (LMs) often exhibit undesirable text generation behaviors, including generating false, toxic, or irrelevant outputs. Reinforcement learning from human feedback (RLHF) - where human preference judgments on LM outputs are transformed into a learning signal - has recently shown promise in addressing these issues. However, such holistic feedback conveys limited information on long text outputs; it does not indicate which aspects of the outputs influenced user preference; e.g., which parts contain what type(s) of errors. In this paper, we use fine-grained human feedback (e.g., which sentence is false, which sub-sentence is irrelevant) as an explicit training signal. We introduce Fine-Grained RLHF, a framework that enables training and learning from reward functions that are fine-grained in two respects: (1) density, providing a reward after every segment (e.g., a sentence) is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

allenai/FineGrainedRLHF
pytorch

Videos

Fine-Grained Human Feedback Gives Better Rewards for Language Model Training· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification