Safe Exploration in Finite Markov Decision Processes with Gaussian   Processes

Matteo Turchetta; Felix Berkenkamp; Andreas Krause

arXiv:1606.04753·cs.LG·January 30, 2017·67 cites

Safe Exploration in Finite Markov Decision Processes with Gaussian Processes

Matteo Turchetta, Felix Berkenkamp, Andreas Krause

PDF

Open Access 1 Repo

TL;DR

This paper introduces a safe exploration algorithm for finite Markov decision processes that uses Gaussian process priors to ensure safety constraints are met during exploration, demonstrated on rover terrain mapping.

Contribution

It presents a novel algorithm that guarantees safe exploration of the reachable MDP regions under unknown safety constraints using Gaussian processes.

Findings

01

Successfully explores safe regions without violating safety constraints.

02

Guarantees complete exploration of safely reachable states.

03

Validated on digital terrain models for rover exploration.

Abstract

In classical reinforcement learning, when exploring an environment, agents accept arbitrary short term loss for long term gain. This is infeasible for safety critical applications, such as robotics, where even a single unsafe action may cause system failure. In this paper, we address the problem of safely exploring finite Markov decision processes (MDP). We define safety in terms of an, a priori unknown, safety constraint that depends on states and actions. We aim to explore the MDP under this constraint, assuming that the unknown function satisfies regularity conditions expressed via a Gaussian process prior. We develop a novel algorithm for this task and prove that it is able to completely explore the safely reachable part of the MDP without violating the safety constraint. To achieve this, it cautiously explores safe states and actions in order to gain statistical confidence about…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

befelix/SafeMDP
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Target Tracking and Data Fusion in Sensor Networks · Robotic Path Planning Algorithms