Safe Reinforcement Learning with Learned Non-Markovian Safety Constraints
Siow Meng Low, Akshat Kumar

TL;DR
This paper introduces a scalable reinforcement learning framework that learns to satisfy complex non-Markovian safety constraints by modeling safety as a trajectory-based credit assignment problem and dynamically balancing safety and reward.
Contribution
It proposes a novel safety model for credit assignment on trajectories, an RL-as-inference algorithm for safe policy optimization, and a dynamic method to balance safety and reward.
Findings
Successfully handles non-Markovian safety constraints
Scalable approach demonstrated on complex safety tasks
Effectively balances safety and reward during training
Abstract
In safe Reinforcement Learning (RL), safety cost is typically defined as a function dependent on the immediate state and actions. In practice, safety constraints can often be non-Markovian due to the insufficient fidelity of state representation, and safety cost may not be known. We therefore address a general setting where safety labels (e.g., safe or unsafe) are associated with state-action trajectories. Our key contributions are: first, we design a safety model that specifically performs credit assignment to assess contributions of partial state-action trajectories on safety. This safety model is trained using a labeled safety dataset. Second, using RL-as-inference strategy we derive an effective algorithm for optimizing a safe policy using the learned safety model. Finally, we devise a method to dynamically adapt the tradeoff coefficient between reward maximization and safety…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Reliability and Analysis Research · Reinforcement Learning in Robotics · Fault Detection and Control Systems
