Conservative and Adaptive Penalty for Model-Based Safe Reinforcement   Learning

Yecheng Jason Ma; Andrew Shen; Osbert Bastani; Dinesh Jayaraman

arXiv:2112.07701·cs.LG·December 16, 2021·1 cites

Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning

Yecheng Jason Ma, Andrew Shen, Osbert Bastani, Dinesh Jayaraman

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces CAP, a model-based safe reinforcement learning framework that uses uncertainty-based penalties and adaptive tuning to ensure safety and improve sample efficiency in real-world environments.

Contribution

CAP is a novel safe RL method that incorporates model uncertainty into penalties and adaptively adjusts them, guaranteeing safety during training.

Findings

01

CAP reduces safety violations compared to prior methods.

02

CAP improves sample efficiency in state and image-based environments.

03

Theoretical guarantees ensure safety of policies during training.

Abstract

Reinforcement Learning (RL) agents in the real world must satisfy safety constraints in addition to maximizing a reward objective. Model-based RL algorithms hold promise for reducing unsafe real-world actions: they may synthesize policies that obey all constraints using simulated samples from a learned model. However, imperfect models can result in real-world constraint violations even for actions that are predicted to satisfy all constraints. We propose Conservative and Adaptive Penalty (CAP), a model-based safe RL framework that accounts for potential modeling errors by capturing model uncertainty and adaptively exploiting it to balance the reward and the cost objectives. First, CAP inflates predicted costs using an uncertainty-based penalty. Theoretically, we show that policies that satisfy this conservative cost constraint are guaranteed to also be feasible in the true environment.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

redrew/cap
pytorchOfficial

Videos

Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning· underline

Taxonomy

TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning