Safe Exploration in Markov Decision Processes

Teodor Mihai Moldovan; Pieter Abbeel

arXiv:1205.4810·cs.LG·July 10, 2012·60 cites

Safe Exploration in Markov Decision Processes

Teodor Mihai Moldovan, Pieter Abbeel

PDF

Open Access

TL;DR

This paper introduces a safe exploration framework for Markov decision processes that avoids the impractical ergodicity assumption, providing an efficient algorithm for safe, yet exploratory, policies with demonstrated effectiveness on complex tasks.

Contribution

It formulates safety via ergodicity constraints, proves NP-hardness of safe policy restriction, and proposes an efficient exploration algorithm compatible with existing methods.

Findings

01

Our method outperforms classical exploration techniques in experiments.

02

The proposed approach ensures safety while maintaining effective exploration.

03

Application to Martian terrain exploration demonstrates practical utility.

Abstract

In environments with uncertain dynamics exploration is necessary to learn how to perform well. Existing reinforcement learning algorithms provide strong exploration guarantees, but they tend to rely on an ergodicity assumption. The essence of ergodicity is that any state is eventually reachable from any other state by following a suitable policy. This assumption allows for exploration algorithms that operate by simply favoring states that have rarely been visited before. For most physical systems this assumption is impractical as the systems would break before any reasonable exploration has taken place, i.e., most physical systems don't satisfy the ergodicity assumption. In this paper we address the need for safe exploration methods in Markov decision processes. We first propose a general formulation of safety through ergodicity. We show that imposing safety by restricting attention to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Formal Methods in Verification