Deep reinforcement learning from human preferences
Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg,, Dario Amodei

TL;DR
This paper introduces a method for training reinforcement learning agents using human preferences over trajectory pairs, enabling complex tasks to be learned efficiently without explicit reward functions.
Contribution
It demonstrates that human preferences can effectively guide RL agents in complex environments, reducing human oversight costs and enabling learning of novel behaviors.
Findings
Successfully trained agents on Atari and robot tasks using less than 1% human feedback.
Achieved complex behaviors with approximately one hour of human input.
Outperformed previous methods in learning from human preferences.
Abstract
For sophisticated reinforcement learning (RL) systems to interact usefully with real-world environments, we need to communicate complex goals to these systems. In this work, we explore goals defined in terms of (non-expert) human preferences between pairs of trajectory segments. We show that this approach can effectively solve complex RL tasks without access to the reward function, including Atari games and simulated robot locomotion, while providing feedback on less than one percent of our agent's interactions with the environment. This reduces the cost of human oversight far enough that it can be practically applied to state-of-the-art RL systems. To demonstrate the flexibility of our approach, we show that we can successfully train complex novel behaviors with about an hour of human time. These behaviors and environments are considerably more complex than any that have been…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Deep Learning From Human Preferences | Two Minute Papers #196· youtube
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Robot Manipulation and Learning
