Policy Optimization for Constrained MDPs with Provable Fast Global   Convergence

Tao Liu; Ruida Zhou; Dileep Kalathil; P. R. Kumar; Chao Tian

arXiv:2111.00552·cs.LG·February 7, 2022·1 cites

Policy Optimization for Constrained MDPs with Provable Fast Global Convergence

Tao Liu, Ruida Zhou, Dileep Kalathil, P. R. Kumar, Chao Tian

PDF

Open Access

TL;DR

This paper introduces the PMD-PD algorithm, a policy gradient method for constrained MDPs that provably converges faster than previous approaches, with experimental validation showing improved performance.

Contribution

The paper proposes a novel policy mirror descent-primal dual algorithm that achieves a faster $ ext{O}(rac{ ext{log}(T)}{T})$ convergence rate for constrained MDPs, improving upon prior $ ext{O}(rac{1}{ ext{sqrt}(T)})$ results.

Findings

01

Faster $ ext{O}(rac{ ext{log}(T)}{T})$ convergence rate achieved.

02

Algorithm outperforms existing policy gradient methods in experiments.

03

Extensions handle zero constraint violation and sample-based estimation.

Abstract

We address the problem of finding the optimal policy of a constrained Markov decision process (CMDP) using a gradient descent-based algorithm. Previous results have shown that a primal-dual approach can achieve an $O (1/ T)$ global convergence rate for both the optimality gap and the constraint violation. We propose a new algorithm called policy mirror descent-primal dual (PMD-PD) algorithm that can provably achieve a faster $O (lo g (T) / T)$ convergence rate for both the optimality gap and the constraint violation. For the primal (policy) update, the PMD-PD algorithm utilizes a modified value function and performs natural policy gradient steps, which is equivalent to a mirror descent step with appropriate regularization. For the dual update, the PMD-PD algorithm uses modified Lagrange multipliers to ensure a faster convergence rate. We also present two extensions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques · Machine Learning and Algorithms