Proactive Constrained Policy Optimization with Preemptive Penalty
Ning Yang, Pengyu Wang, Guoqing Liu, Haifeng Zhang, Pin Lv, Jun Wang

TL;DR
This paper introduces PCPO, a proactive constrained policy optimization method that uses preemptive penalties and boundary-aware exploration to improve safety and stability in reinforcement learning.
Contribution
The paper proposes a novel PCPO method with preemptive penalties and boundary-aware rewards, providing theoretical bounds and demonstrating improved stability over traditional approaches.
Findings
PCPO achieves significant stability in experiments.
Theoretical bounds for duality gap and convergence are established.
Experimental results show robust constraint adherence.
Abstract
Safe Reinforcement Learning (RL) often faces significant issues such as constraint violations and instability, necessitating the use of constrained policy optimization, which seeks optimal policies while ensuring adherence to specific constraints like safety. Typically, constrained optimization problems are addressed by the Lagrangian method, a post-violation remedial approach that may result in oscillations and overshoots. Motivated by this, we propose a novel method named Proactive Constrained Policy Optimization (PCPO) that incorporates a preemptive penalty mechanism. This mechanism integrates barrier items into the objective function as the policy nears the boundary, imposing a cost. Meanwhile, we introduce a constraint-aware intrinsic reward to guide boundary-aware exploration, which is activated only when the policy approaches the constraint boundary. We establish theoretical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
