Convergent Policy Optimization for Safe Reinforcement Learning

Ming Yu; Zhuoran Yang; Mladen Kolar; Zhaoran Wang

arXiv:1910.12156·cs.LG·October 29, 2019·31 cites

Convergent Policy Optimization for Safe Reinforcement Learning

Ming Yu, Zhuoran Yang, Mladen Kolar, Zhaoran Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a convergent policy optimization method for safe reinforcement learning with nonlinear function approximation, transforming a nonconvex constrained problem into a sequence of convex surrogates.

Contribution

It proposes a novel surrogate convex optimization approach that guarantees convergence to a stationary point in safe reinforcement learning with nonconvex constraints.

Findings

01

Convergence of solutions to the original nonconvex problem.

02

Application to optimal control and multi-agent RL with safety constraints.

03

Theoretical guarantees for the proposed method.

Abstract

We study the safe reinforcement learning problem with nonlinear function approximation, where policy optimization is formulated as a constrained optimization problem with both the objective and the constraint being nonconvex functions. For such a problem, we construct a sequence of surrogate convex constrained optimization problems by replacing the nonconvex functions locally with convex quadratic functions obtained from policy gradient estimators. We prove that the solutions to these surrogate problems converge to a stationary point of the original nonconvex problem. Furthermore, to extend our theoretical results, we apply our algorithm to examples of optimal control and multi-agent reinforcement learning with safety constraints.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ming93/Safe_reinforcement_learning
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Distributed Control Multi-Agent Systems