Safe Reinforcement Learning with Dead-Ends Avoidance and Recovery

Xiao Zhang; Hai Zhang; Hongtu Zhou; Chang Huang; Di Zhang; Chen Ye,; Junqiao Zhao

arXiv:2306.13944·cs.LG·June 27, 2023·1 cites

Safe Reinforcement Learning with Dead-Ends Avoidance and Recovery

Xiao Zhang, Hai Zhang, Hongtu Zhou, Chang Huang, Di Zhang, Chen Ye,, Junqiao Zhao

PDF

Open Access

TL;DR

This paper introduces a novel safe reinforcement learning method that constructs a safety boundary to avoid dead-ends, enabling safer exploration and improved task performance in continuous control tasks.

Contribution

The paper proposes a boundary-based safety mechanism and a decoupled RL framework with offline safety critic pretraining for enhanced safe exploration.

Findings

01

Better task performance than state-of-the-art methods.

02

Fewer safety violations during training.

03

Effective dead-end state discrimination.

Abstract

Safety is one of the main challenges in applying reinforcement learning to realistic environmental tasks. To ensure safety during and after training process, existing methods tend to adopt overly conservative policy to avoid unsafe situations. However, overly conservative policy severely hinders the exploration, and makes the algorithms substantially less rewarding. In this paper, we propose a method to construct a boundary that discriminates safe and unsafe states. The boundary we construct is equivalent to distinguishing dead-end states, indicating the maximum extent to which safe exploration is guaranteed, and thus has minimum limitation on exploration. Similar to Recovery Reinforcement Learning, we utilize a decoupled RL framework to learn two policies, (1) a task policy that only considers improving the task performance, and (2) a recovery policy that maximizes safety. The recovery…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics