Trial without Error: Towards Safe Reinforcement Learning via Human   Intervention

William Saunders; Girish Sastry; Andreas Stuhlmueller; Owain Evans

arXiv:1707.05173·cs.AI·July 18, 2017·110 cites

Trial without Error: Towards Safe Reinforcement Learning via Human Intervention

William Saunders, Girish Sastry, Andreas Stuhlmueller, Owain Evans

PDF

Open Access 1 Repo

TL;DR

This paper explores a method to ensure safe reinforcement learning by using human intervention and imitation learning to prevent catastrophic actions during training, demonstrating success in simple scenarios but highlighting scalability challenges.

Contribution

It formalizes human intervention in RL and proposes a supervised learning approach to imitate intervention decisions, reducing human effort and improving safety in training.

Findings

01

Successfully prevented all catastrophes in simple Atari scenarios

02

Supervised imitation reduced human labor compared to manual intervention

03

Scalability issues arise with more complex environments and adversarial examples

Abstract

AI systems are increasingly applied to complex tasks that involve interaction with humans. During training, such systems are potentially dangerous, as they haven't yet learned to avoid actions that could cause serious harm. How can an AI system explore and learn without making a single mistake that harms humans or otherwise causes serious damage? For model-free reinforcement learning, having a human "in the loop" and ready to intervene is currently the only way to prevent all catastrophes. We formalize human intervention for RL and show how to reduce the human labor required by training a supervised learner to imitate the human's intervention decisions. We evaluate this scheme on Atari games, with a Deep RL agent being overseen by a human for four hours. When the class of catastrophes is simple, we are able to prevent all catastrophes without affecting the agent's learning (whereas an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gsastry/human-rl
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)