Convergent Policy Optimization for Safe Reinforcement Learning
Ming Yu, Zhuoran Yang, Mladen Kolar, Zhaoran Wang

TL;DR
This paper introduces a convergent policy optimization method for safe reinforcement learning with nonlinear function approximation, transforming a nonconvex constrained problem into a sequence of convex surrogates.
Contribution
It proposes a novel surrogate convex optimization approach that guarantees convergence to a stationary point in safe reinforcement learning with nonconvex constraints.
Findings
Convergence of solutions to the original nonconvex problem.
Application to optimal control and multi-agent RL with safety constraints.
Theoretical guarantees for the proposed method.
Abstract
We study the safe reinforcement learning problem with nonlinear function approximation, where policy optimization is formulated as a constrained optimization problem with both the objective and the constraint being nonconvex functions. For such a problem, we construct a sequence of surrogate convex constrained optimization problems by replacing the nonconvex functions locally with convex quadratic functions obtained from policy gradient estimators. We prove that the solutions to these surrogate problems converge to a stationary point of the original nonconvex problem. Furthermore, to extend our theoretical results, we apply our algorithm to examples of optimal control and multi-agent reinforcement learning with safety constraints.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Distributed Control Multi-Agent Systems
