Progressive Safeguards for Safe and Model-Agnostic Reinforcement Learning
Nabil Omi, Hosein Hasanbeig, Hiteshi Sharma, Sriram K. Rajamani,, Siddhartha Sen

TL;DR
This paper introduces a formal, model-agnostic meta-learning framework for safe reinforcement learning that uses a safeguard modeled as a finite-state machine to ensure safety across tasks, enabling efficient transfer of safety knowledge.
Contribution
The paper presents a novel, flexible safety framework for reinforcement learning that is model-agnostic, capable of handling complex safety specifications, and transferable across tasks with minimal violations.
Findings
Agents achieve near-minimal safety violations in experiments
The framework is applicable from pixel-level control to language models
Baseline methods underperform compared to the proposed approach
Abstract
In this paper we propose a formal, model-agnostic meta-learning framework for safe reinforcement learning. Our framework is inspired by how parents safeguard their children across a progression of increasingly riskier tasks, imparting a sense of safety that is carried over from task to task. We model this as a meta-learning process where each task is synchronized with a safeguard that monitors safety and provides a reward signal to the agent. The safeguard is implemented as a finite-state machine based on a safety specification; the reward signal is formally shaped around this specification. The safety specification and its corresponding safeguard can be arbitrarily complex and non-Markovian, which adds flexibility to the training process and explainability to the learned policy. The design of the safeguard is manual but it is high-level and model-agnostic, which gives rise to an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications
MethodsSparse Evolutionary Training
