Trial without Error: Towards Safe Reinforcement Learning via Human Intervention
William Saunders, Girish Sastry, Andreas Stuhlmueller, Owain Evans

TL;DR
This paper explores a method to ensure safe reinforcement learning by using human intervention and imitation learning to prevent catastrophic actions during training, demonstrating success in simple scenarios but highlighting scalability challenges.
Contribution
It formalizes human intervention in RL and proposes a supervised learning approach to imitate intervention decisions, reducing human effort and improving safety in training.
Findings
Successfully prevented all catastrophes in simple Atari scenarios
Supervised imitation reduced human labor compared to manual intervention
Scalability issues arise with more complex environments and adversarial examples
Abstract
AI systems are increasingly applied to complex tasks that involve interaction with humans. During training, such systems are potentially dangerous, as they haven't yet learned to avoid actions that could cause serious harm. How can an AI system explore and learn without making a single mistake that harms humans or otherwise causes serious damage? For model-free reinforcement learning, having a human "in the loop" and ready to intervene is currently the only way to prevent all catastrophes. We formalize human intervention for RL and show how to reduce the human labor required by training a supervised learner to imitate the human's intervention decisions. We evaluate this scheme on Atari games, with a Deep RL agent being overseen by a human for four hours. When the class of catastrophes is simple, we are able to prevent all catastrophes without affecting the agent's learning (whereas an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
