Successive Convex Approximation Based Off-Policy Optimization for   Constrained Reinforcement Learning

Chang Tian; An Liu; Guang Huang; Wu Luo

arXiv:2105.12545·cs.LG·April 20, 2022

Successive Convex Approximation Based Off-Policy Optimization for Constrained Reinforcement Learning

Chang Tian, An Liu, Guang Huang, Wu Luo

PDF

1 Repo

TL;DR

This paper introduces SCAOPO, an off-policy optimization algorithm for constrained reinforcement learning that efficiently solves CMDPs by iteratively approximating the problem with convex surrogates, enabling online learning with experience reuse.

Contribution

The paper presents a novel successive convex approximation method for off-policy constrained RL that guarantees convergence to KKT points and reduces implementation costs.

Findings

01

Converges to KKT points under time-varying distributions

02

Enables online learning with experience reuse

03

Proven convergence with feasible initial points

Abstract

We propose a successive convex approximation based off-policy optimization (SCAOPO) algorithm to solve the general constrained reinforcement learning problem, which is formulated as a constrained Markov decision process (CMDP) in the context of average cost. The SCAOPO is based on solving a sequence of convex objective/feasibility optimization problems obtained by replacing the objective and constraint functions in the original problems with convex surrogate functions. At each iteration, the convex surrogate problem can be efficiently solved by Lagrange dual method even the policy is parameterized by a high-dimensional function. Moreover, the SCAOPO enables to reuse old experiences from previous updates, thereby significantly reducing the implementation cost when deployed in the real-world engineering systems that need to online learn the environment. In spite of the time-varying state…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Kaijin1996/SCAOPO
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.