Safe reinforcement learning for probabilistic reachability and safety specifications: A Lyapunov-based approach
Subin Huh, Insoon Yang

TL;DR
This paper introduces a Lyapunov-based, model-free safe reinforcement learning method that ensures probabilistic safety in autonomous systems, expanding safe operation sets and accelerating safe exploration, applicable to high-dimensional systems.
Contribution
It develops a novel Lyapunov-based safe RL framework that guarantees safety, incorporates efficient exploration, and extends to deep RL with a Lagrangian approach for complex systems.
Findings
Successfully applied to continuous control benchmarks.
Safe policies monotonically expand the safe set.
Efficient safe exploration accelerates safety verification.
Abstract
Emerging applications in robotics and autonomous systems, such as autonomous driving and robotic surgery, often involve critical safety constraints that must be satisfied even when information about system models is limited. In this regard, we propose a model-free safety specification method that learns the maximal probability of safe operation by carefully combining probabilistic reachability analysis and safe reinforcement learning (RL). Our approach constructs a Lyapunov function with respect to a safe policy to restrain each policy improvement stage. As a result, it yields a sequence of safe policies that determine the range of safe operation, called the safe set, which monotonically expands and gradually converges. We also develop an efficient safe exploration scheme that accelerates the process of identifying the safety of unexamined states. Exploiting the Lyapunov shielding, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics · Software Reliability and Analysis Research
