Deep reinforcement learning from human preferences

Paul Christiano; Jan Leike; Tom B. Brown; Miljan Martic; Shane Legg,; Dario Amodei

arXiv:1706.03741·stat.ML·February 20, 2023·508 cites

Deep reinforcement learning from human preferences

Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg,, Dario Amodei

PDF

Open Access 5 Repos 1 Models 1 Video

TL;DR

This paper introduces a method for training reinforcement learning agents using human preferences over trajectory pairs, enabling complex tasks to be learned efficiently without explicit reward functions.

Contribution

It demonstrates that human preferences can effectively guide RL agents in complex environments, reducing human oversight costs and enabling learning of novel behaviors.

Findings

01

Successfully trained agents on Atari and robot tasks using less than 1% human feedback.

02

Achieved complex behaviors with approximately one hour of human input.

03

Outperformed previous methods in learning from human preferences.

Abstract

For sophisticated reinforcement learning (RL) systems to interact usefully with real-world environments, we need to communicate complex goals to these systems. In this work, we explore goals defined in terms of (non-expert) human preferences between pairs of trajectory segments. We show that this approach can effectively solve complex RL tasks without access to the reward function, including Atari games and simulated robot locomotion, while providing feedback on less than one percent of our agent's interactions with the environment. This reduces the cost of human oversight far enough that it can be practically applied to state-of-the-art RL systems. To demonstrate the flexibility of our approach, we show that we can successfully train complex novel behaviors with about an hour of human time. These behaviors and environments are considerably more complex than any that have been…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
crichalchemist/phi-humanity-welfare-function
model

Videos

Deep Learning From Human Preferences | Two Minute Papers #196· youtube

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Robot Manipulation and Learning