Safe Reinforcement Learning with Learned Non-Markovian Safety   Constraints

Siow Meng Low; Akshat Kumar

arXiv:2405.03005·cs.LG·May 7, 2024

Safe Reinforcement Learning with Learned Non-Markovian Safety Constraints

Siow Meng Low, Akshat Kumar

PDF

Open Access

TL;DR

This paper introduces a scalable reinforcement learning framework that learns to satisfy complex non-Markovian safety constraints by modeling safety as a trajectory-based credit assignment problem and dynamically balancing safety and reward.

Contribution

It proposes a novel safety model for credit assignment on trajectories, an RL-as-inference algorithm for safe policy optimization, and a dynamic method to balance safety and reward.

Findings

01

Successfully handles non-Markovian safety constraints

02

Scalable approach demonstrated on complex safety tasks

03

Effectively balances safety and reward during training

Abstract

In safe Reinforcement Learning (RL), safety cost is typically defined as a function dependent on the immediate state and actions. In practice, safety constraints can often be non-Markovian due to the insufficient fidelity of state representation, and safety cost may not be known. We therefore address a general setting where safety labels (e.g., safe or unsafe) are associated with state-action trajectories. Our key contributions are: first, we design a safety model that specifically performs credit assignment to assess contributions of partial state-action trajectories on safety. This safety model is trained using a labeled safety dataset. Second, using RL-as-inference strategy we derive an effective algorithm for optimizing a safe policy using the learned safety model. Finally, we devise a method to dynamically adapt the tradeoff coefficient between reward maximization and safety…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Reliability and Analysis Research · Reinforcement Learning in Robotics · Fault Detection and Control Systems