Absolute State-wise Constrained Policy Optimization: High-Probability State-wise Constraints Satisfaction
Weiye Zhao, Feihan Li, Yifan Sun, Yujie Wang, Rui Chen, Tianhao Wei,, Changliu Liu

TL;DR
This paper introduces ASCPO, a new reinforcement learning algorithm that guarantees high-probability satisfaction of state-wise safety constraints in stochastic systems, outperforming existing methods in robot control tasks.
Contribution
We propose ASCPO, a policy optimization algorithm that enforces state-wise safety with high probability without strong assumptions, advancing safe RL in real-world applications.
Findings
ASCPO achieves high-probability safety constraint satisfaction in continuous control tasks.
ASCPO outperforms existing safe RL methods in robot locomotion experiments.
The approach is effective for complex, real-world-like safety constraints.
Abstract
Enforcing state-wise safety constraints is critical for the application of reinforcement learning (RL) in real-world problems, such as autonomous driving and robot manipulation. However, existing safe RL methods only enforce state-wise constraints in expectation or enforce hard state-wise constraints with strong assumptions. The former does not exclude the probability of safety violations, while the latter is impractical. Our insight is that although it is intractable to guarantee hard state-wise constraints in a model-free setting, we can enforce state-wise safety with high probability while excluding strong assumptions. To accomplish the goal, we propose Absolute State-wise Constrained Policy Optimization (ASCPO), a novel general-purpose policy search algorithm that guarantees high-probability state-wise constraint satisfaction for stochastic systems. We demonstrate the effectiveness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications · Formal Methods in Verification · Software Reliability and Analysis Research
