Safe Reinforcement Learning for Constrained Markov Decision Processes   with Stochastic Stopping Time

Abhijit Mazumdar; Rafal Wisniewski; Manuela L. Bujorianu

arXiv:2403.15928·cs.LG·March 26, 2024·1 cites

Safe Reinforcement Learning for Constrained Markov Decision Processes with Stochastic Stopping Time

Abhijit Mazumdar, Rafal Wisniewski, Manuela L. Bujorianu

PDF

Open Access

TL;DR

This paper introduces an online reinforcement learning algorithm for constrained Markov decision processes with stochastic stopping times, ensuring safety during learning without requiring a process model.

Contribution

It proposes a model-free linear programming based algorithm that guarantees safety with high confidence and introduces a method for computing safe baseline policies.

Findings

01

The algorithm maintains safety constraints during learning.

02

Simulation results confirm the effectiveness of the proposed approach.

03

Efficient exploration is achieved using a proxy state-space set.

Abstract

In this paper, we present an online reinforcement learning algorithm for constrained Markov decision processes with a safety constraint. Despite the necessary attention of the scientific community, considering stochastic stopping time, the problem of learning optimal policy without violating safety constraints during the learning phase is yet to be addressed. To this end, we propose an algorithm based on linear programming that does not require a process model. We show that the learned policy is safe with high confidence. We also propose a method to compute a safe baseline policy, which is central in developing algorithms that do not violate the safety constraints. Finally, we provide simulation results to show the efficacy of the proposed algorithm. Further, we demonstrate that efficient exploration can be achieved by defining a subset of the state-space called proxy set.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Smart Grid Energy Management