Safe Exploration in Markov Decision Processes
Teodor Mihai Moldovan, Pieter Abbeel

TL;DR
This paper introduces a safe exploration framework for Markov decision processes that avoids the impractical ergodicity assumption, providing an efficient algorithm for safe, yet exploratory, policies with demonstrated effectiveness on complex tasks.
Contribution
It formulates safety via ergodicity constraints, proves NP-hardness of safe policy restriction, and proposes an efficient exploration algorithm compatible with existing methods.
Findings
Our method outperforms classical exploration techniques in experiments.
The proposed approach ensures safety while maintaining effective exploration.
Application to Martian terrain exploration demonstrates practical utility.
Abstract
In environments with uncertain dynamics exploration is necessary to learn how to perform well. Existing reinforcement learning algorithms provide strong exploration guarantees, but they tend to rely on an ergodicity assumption. The essence of ergodicity is that any state is eventually reachable from any other state by following a suitable policy. This assumption allows for exploration algorithms that operate by simply favoring states that have rarely been visited before. For most physical systems this assumption is impractical as the systems would break before any reasonable exploration has taken place, i.e., most physical systems don't satisfy the ergodicity assumption. In this paper we address the need for safe exploration methods in Markov decision processes. We first propose a general formulation of safety through ergodicity. We show that imposing safety by restricting attention to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Formal Methods in Verification
