Safe Policy Improvement in Constrained Markov Decision Processes

Luigi Berducci; Radu Grosu

arXiv:2210.11259·cs.LG·October 21, 2022

Safe Policy Improvement in Constrained Markov Decision Processes

Luigi Berducci, Radu Grosu

PDF

TL;DR

This paper introduces a method for safe reinforcement learning in constrained Markov decision processes, combining reward shaping from formal requirements with a safe policy update algorithm, ensuring safety guarantees during policy improvement.

Contribution

It presents an automatic reward-shaping procedure and a safe policy update algorithm with high-confidence guarantees, advancing safe RL in safety-critical applications.

Findings

01

Effective safe policy improvement demonstrated on control benchmarks.

02

Robustness maintained under hyperparameter perturbations.

03

Model-based RL enhances data efficiency and safety.

Abstract

The automatic synthesis of a policy through reinforcement learning (RL) from a given set of formal requirements depends on the construction of a reward signal and consists of the iterative application of many policy-improvement steps. The synthesis algorithm has to balance target, safety, and comfort requirements in a single objective and to guarantee that the policy improvement does not increase the number of safety-requirements violations, especially for safety-critical applications. In this work, we present a solution to the synthesis problem by solving its two main challenges: reward-shaping from a set of formal requirements and safe policy update. For the former, we propose an automatic reward-shaping procedure, defining a scalar reward signal compliant with the task specification. For the latter, we introduce an algorithm ensuring that the policy is improved in a safe fashion with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.