Explicit Explore, Exploit, or Escape ($E^4$): near-optimal   safety-constrained reinforcement learning in polynomial time

David M. Bossens; Nicholas Bishop

arXiv:2111.07395·cs.LG·June 24, 2022·1 cites

Explicit Explore, Exploit, or Escape ($E^4$): near-optimal safety-constrained reinforcement learning in polynomial time

David M. Bossens, Nicholas Bishop

PDF

Open Access

TL;DR

This paper introduces $E^4$, a model-based reinforcement learning algorithm that ensures safety and near-optimal performance in constrained environments by explicitly managing exploration, exploitation, and escape strategies within polynomial time.

Contribution

The paper extends the $E^{3}$ algorithm to a robust constrained setting, providing a polynomial-time method for safe, near-optimal policy learning in unknown environments.

Findings

01

$E^4$ guarantees safety constraints during learning.

02

$E^4$ finds near-optimal policies in polynomial time.

03

Theoretical analysis supports robustness and efficiency.

Abstract

In reinforcement learning (RL), an agent must explore an initially unknown environment in order to learn a desired behaviour. When RL agents are deployed in real world environments, safety is of primary concern. Constrained Markov decision processes (CMDPs) can provide long-term safety constraints; however, the agent may violate the constraints in an effort to explore its environment. This paper proposes a model-based RL algorithm called Explicit Explore, Exploit, or Escape ( $E^{4}$ ), which extends the Explicit Explore or Exploit ( $E^{3}$ ) algorithm to a robust CMDP setting. $E^{4}$ explicitly separates exploitation, exploration, and escape CMDPs, allowing targeted policies for policy improvement across known states, discovery of unknown states, as well as safe return to known states. $E^{4}$ robustly optimises these policies on the worst-case CMDP from a set of CMDP models consistent with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics