Safe Reinforcement Learning in Constrained Markov Decision Processes

Akifumi Wachi; Yanan Sui

arXiv:2008.06626·cs.LG·August 18, 2020·54 cites

Safe Reinforcement Learning in Constrained Markov Decision Processes

Akifumi Wachi, Yanan Sui

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces SNO-MDP, an algorithm for safe reinforcement learning that learns safety constraints and optimizes rewards within safe regions, with theoretical guarantees and practical validation in synthetic and real-world inspired environments.

Contribution

The paper proposes SNO-MDP, a novel algorithm that explores and optimizes safety and reward in unknown constrained MDPs with theoretical safety and optimality guarantees.

Findings

01

SNO-MDP effectively learns safety constraints and optimizes rewards.

02

Theoretical guarantees ensure safety constraint satisfaction and near-optimal reward.

03

Experimental results demonstrate success in synthetic and Mars exploration scenarios.

Abstract

Safe reinforcement learning has been a promising approach for optimizing the policy of an agent that operates in safety-critical applications. In this paper, we propose an algorithm, SNO-MDP, that explores and optimizes Markov decision processes under unknown safety constraints. Specifically, we take a stepwise approach for optimizing safety and cumulative reward. In our method, the agent first learns safety constraints by expanding the safe region, and then optimizes the cumulative reward in the certified safe region. We provide theoretical guarantees on both the satisfaction of the safety constraint and the near-optimality of the cumulative reward under proper regularity assumptions. In our experiments, we demonstrate the effectiveness of SNO-MDP through two experiments: one uses a synthetic data in a new, openly-available environment named GP-SAFETY-GYM, and the other simulates Mars…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

akifumi-wachi-4/safe_near_optimal_mdp
noneOfficial

Videos

Safe Reinforcement Learning in Constrained Markov Decision Processes· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques