Effects of Safety State Augmentation on Safe Exploration
Aivar Sootla, Alexander I. Cowen-Rivers, Jun Wang, Haitham Bou Ammar

TL;DR
This paper introduces a novel safety state augmentation method called Simmer for safe reinforcement learning, which improves safety and stability during training by better managing safety constraints.
Contribution
The paper proposes a safety state augmentation technique that enables dynamic safety budget scheduling, enhancing safe exploration and stability in model-free RL.
Findings
Simmer improves safety during training in constrained RL tasks.
It stabilizes training and enhances performance in average cost constrained RL.
The approach effectively manages safety budgets during learning.
Abstract
Safe exploration is a challenging and important problem in model-free reinforcement learning (RL). Often the safety cost is sparse and unknown, which unavoidably leads to constraint violations -- a phenomenon ideally to be avoided in safety-critical applications. We tackle this problem by augmenting the state-space with a safety state, which is nonnegative if and only if the constraint is satisfied. The value of this state also serves as a distance toward constraint violation, while its initial value indicates the available safety budget. This idea allows us to derive policies for scheduling the safety budget during training. We call our approach Simmer (Safe policy IMproveMEnt for RL) to reflect the careful nature of these schedules. We apply this idea to two safe RL problems: RL with constraints imposed on an average cost, and RL with constraints imposed on a cost with probability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Reinforcement Learning in Robotics
