The History and Risks of Reinforcement Learning and Human Feedback

Nathan Lambert; Thomas Krendl Gilbert; Tom Zick

arXiv:2310.13595·cs.CY·November 29, 2023·2 cites

The History and Risks of Reinforcement Learning and Human Feedback

Nathan Lambert, Thomas Krendl Gilbert, Tom Zick

PDF

Open Access

TL;DR

This paper reviews the development, challenges, and sociotechnical aspects of reinforcement learning from human feedback (RLHF), emphasizing the need for transparency and further research into reward models used in large language models.

Contribution

It provides a comprehensive historical and conceptual analysis of RLHF, highlighting methodological tensions and proposing research directions to better understand reward models.

Findings

01

RLHF reward models are central but poorly understood.

02

There are ontological differences between costs, rewards, and preferences.

03

Transparency and further study are crucial for advancing RLHF understanding.

Abstract

Reinforcement learning from human feedback (RLHF) has emerged as a powerful technique to make large language models (LLMs) easier to use and more effective. A core piece of the RLHF process is the training and utilization of a model of human preferences that acts as a reward function for optimization. This approach, which operates at the intersection of many stakeholders and academic disciplines, remains poorly understood. RLHF reward models are often cited as being central to achieving performance, yet very few descriptors of capabilities, evaluations, training methods, or open-source models exist. Given this lack of information, further study and transparency is needed for learned RLHF reward models. In this paper, we illustrate the complex history of optimizing preferences, and articulate lines of inquiry to understand the sociotechnical context of reward models. In particular, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research