Conservative Safety Critics for Exploration

Homanga Bharadhwaj; Aviral Kumar; Nicholas Rhinehart; Sergey Levine,; Florian Shkurti; Animesh Garg

arXiv:2010.14497·cs.LG·April 27, 2021·32 cites

Conservative Safety Critics for Exploration

Homanga Bharadhwaj, Aviral Kumar, Nicholas Rhinehart, Sergey Levine,, Florian Shkurti, Animesh Garg

PDF

Open Access 1 Video

TL;DR

This paper introduces a conservative safety critic for reinforcement learning that bounds the risk of catastrophic failures during exploration, ensuring safer training while maintaining competitive task performance.

Contribution

It proposes a novel conservative safety critic that provides provable safety guarantees and balances safety with policy improvement in RL.

Findings

01

Achieves lower failure rates during training compared to prior methods.

02

Provides theoretical guarantees of safety and convergence.

03

Demonstrates effectiveness on navigation, manipulation, and locomotion tasks.

Abstract

Safe exploration presents a major challenge in reinforcement learning (RL): when active data collection requires deploying partially trained policies, we must ensure that these policies avoid catastrophically unsafe regions, while still enabling trial and error learning. In this paper, we target the problem of safe exploration in RL by learning a conservative safety estimate of environment states through a critic, and provably upper bound the likelihood of catastrophic failures at every training iteration. We theoretically characterize the tradeoff between safety and policy improvement, show that the safety constraints are likely to be satisfied with high probability during training, derive provable convergence guarantees for our approach, which is no worse asymptotically than standard RL, and demonstrate the efficacy of the proposed approach on a suite of challenging navigation,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Conservative Safety Critics for Exploration· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Adversarial Robustness in Machine Learning