Safe Reinforcement Learning with Dead-Ends Avoidance and Recovery
Xiao Zhang, Hai Zhang, Hongtu Zhou, Chang Huang, Di Zhang, Chen Ye,, Junqiao Zhao

TL;DR
This paper introduces a novel safe reinforcement learning method that constructs a safety boundary to avoid dead-ends, enabling safer exploration and improved task performance in continuous control tasks.
Contribution
The paper proposes a boundary-based safety mechanism and a decoupled RL framework with offline safety critic pretraining for enhanced safe exploration.
Findings
Better task performance than state-of-the-art methods.
Fewer safety violations during training.
Effective dead-end state discrimination.
Abstract
Safety is one of the main challenges in applying reinforcement learning to realistic environmental tasks. To ensure safety during and after training process, existing methods tend to adopt overly conservative policy to avoid unsafe situations. However, overly conservative policy severely hinders the exploration, and makes the algorithms substantially less rewarding. In this paper, we propose a method to construct a boundary that discriminates safe and unsafe states. The boundary we construct is equivalent to distinguishing dead-end states, indicating the maximum extent to which safe exploration is guaranteed, and thus has minimum limitation on exploration. Similar to Recovery Reinforcement Learning, we utilize a decoupled RL framework to learn two policies, (1) a task policy that only considers improving the task performance, and (2) a recovery policy that maximizes safety. The recovery…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
