Learn With Imagination: Safe Set Guided State-wise Constrained Policy Optimization
Yifan Sun, Feihan Li, Weiye Zhao, Rui Chen, Tianhao Wei, Changliu Liu

TL;DR
This paper introduces S-3PO, a novel safe reinforcement learning algorithm that guarantees zero safety violations during training by combining safety monitoring with imaginary cost enforcement, enabling safe exploration in complex tasks.
Contribution
S-3PO is the first algorithm to achieve zero training violations in state-wise constrained RL using a safety monitor and imaginary cost enforcement.
Findings
Outperforms existing methods in high-dimensional robotics tasks
Achieves zero training violations during learning
Ensures safe exploration with black-box dynamics
Abstract
Deep reinforcement learning (RL) excels in various control tasks, yet the absence of safety guarantees hampers its real-world applicability. In particular, explorations during learning usually results in safety violations, while the RL agent learns from those mistakes. On the other hand, safe control techniques ensure persistent safety satisfaction but demand strong priors on system dynamics, which is usually hard to obtain in practice. To address these problems, we present Safe Set Guided State-wise Constrained Policy Optimization (S-3PO), a pioneering algorithm generating state-wise safe optimal policies with zero training violations, i.e., learning without mistakes. S-3PO first employs a safety-oriented monitor with black-box dynamics to ensure safe exploration. It then enforces an "imaginary" cost for the RL agent to converge to optimal behaviors within safety constraints. S-3PO…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics · Age of Information Optimization
