Safe Reinforcement Learning Using Advantage-Based Intervention

Nolan Wagener; Byron Boots; Ching-An Cheng

arXiv:2106.09110·cs.LG·July 20, 2021·5 cites

Safe Reinforcement Learning Using Advantage-Based Intervention

Nolan Wagener, Byron Boots, Ching-An Cheng

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces SAILR, a safe reinforcement learning algorithm that ensures safety during training and deployment by using advantage-based interventions, outperforming existing methods in constraint adherence.

Contribution

SAILR is a novel safe RL algorithm that maintains safety during training through advantage-based interventions and guarantees safety and performance without intervention after training.

Findings

01

SAILR significantly reduces constraint violations during training.

02

SAILR converges to a high-performing, safe policy.

03

The method provides safety guarantees during both training and deployment.

Abstract

Many sequential decision problems involve finding a policy that maximizes total reward while obeying safety constraints. Although much recent research has focused on the development of safe reinforcement learning (RL) algorithms that produce a safe policy after training, ensuring safety during training as well remains an open problem. A fundamental challenge is performing exploration while still satisfying constraints in an unknown Markov decision process (MDP). In this work, we address this problem for the chance-constrained setting. We propose a new algorithm, SAILR, that uses an intervention mechanism based on advantage functions to keep the agent safe throughout training and optimizes the agent's policy using off-the-shelf RL algorithms designed for unconstrained MDPs. Our method comes with strong guarantees on safety during both training and deployment (i.e., after training and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nolanwagener/safe_rl
pytorchOfficial

Videos

Safe Reinforcement Learning Using Advantage-Based Intervention· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)