Learn With Imagination: Safe Set Guided State-wise Constrained Policy Optimization

Yifan Sun; Feihan Li; Weiye Zhao; Rui Chen; Tianhao Wei; Changliu Liu

arXiv:2308.13140·cs.RO·June 4, 2025

Learn With Imagination: Safe Set Guided State-wise Constrained Policy Optimization

Yifan Sun, Feihan Li, Weiye Zhao, Rui Chen, Tianhao Wei, Changliu Liu

PDF

Open Access

TL;DR

This paper introduces S-3PO, a novel safe reinforcement learning algorithm that guarantees zero safety violations during training by combining safety monitoring with imaginary cost enforcement, enabling safe exploration in complex tasks.

Contribution

S-3PO is the first algorithm to achieve zero training violations in state-wise constrained RL using a safety monitor and imaginary cost enforcement.

Findings

01

Outperforms existing methods in high-dimensional robotics tasks

02

Achieves zero training violations during learning

03

Ensures safe exploration with black-box dynamics

Abstract

Deep reinforcement learning (RL) excels in various control tasks, yet the absence of safety guarantees hampers its real-world applicability. In particular, explorations during learning usually results in safety violations, while the RL agent learns from those mistakes. On the other hand, safe control techniques ensure persistent safety satisfaction but demand strong priors on system dynamics, which is usually hard to obtain in practice. To address these problems, we present Safe Set Guided State-wise Constrained Policy Optimization (S-3PO), a pioneering algorithm generating state-wise safe optimal policies with zero training violations, i.e., learning without mistakes. S-3PO first employs a safety-oriented monitor with black-box dynamics to ensure safe exploration. It then enforces an "imaginary" cost for the RL agent to converge to optimal behaviors within safety constraints. S-3PO…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics · Age of Information Optimization