Penalized Proximal Policy Optimization for Safe Reinforcement Learning

Linrui Zhang; Li Shen; Long Yang; Shixiang Chen; Bo Yuan; Xueqian; Wang; Dacheng Tao

arXiv:2205.11814·cs.LG·June 20, 2022·6 cites

Penalized Proximal Policy Optimization for Safe Reinforcement Learning

Linrui Zhang, Li Shen, Long Yang, Shixiang Chen, Bo Yuan, Xueqian, Wang, Dacheng Tao

PDF

Open Access

TL;DR

This paper introduces P3O, a novel safe reinforcement learning algorithm that simplifies constrained policy optimization using penalty functions, enabling efficient policy updates and extending to multi-constraint and multi-agent scenarios.

Contribution

P3O provides a unified unconstrained formulation for constrained policy optimization, with theoretical guarantees and extensions to complex multi-constraint and multi-agent settings.

Findings

01

P3O outperforms existing algorithms in reward and safety constraints.

02

Theoretical proof of exactness with finite penalty factor.

03

Effective in multi-constraint and multi-agent tasks.

Abstract

Safe reinforcement learning aims to learn the optimal policy while satisfying safety constraints, which is essential in real-world applications. However, current algorithms still struggle for efficient policy updates with hard constraint satisfaction. In this paper, we propose Penalized Proximal Policy Optimization (P3O), which solves the cumbersome constrained policy iteration via a single minimization of an equivalent unconstrained problem. Specifically, P3O utilizes a simple-yet-effective penalty function to eliminate cost constraints and removes the trust-region constraint by the clipped surrogate objective. We theoretically prove the exactness of the proposed method with a finite penalty factor and provide a worst-case analysis for approximate error when evaluated on sample trajectories. Moreover, we extend P3O to more challenging multi-constraint and multi-agent scenarios which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Autonomous Vehicle Technology and Safety