Policy Optimization for Constrained MDPs with Provable Fast Global Convergence
Tao Liu, Ruida Zhou, Dileep Kalathil, P. R. Kumar, Chao Tian

TL;DR
This paper introduces the PMD-PD algorithm, a policy gradient method for constrained MDPs that provably converges faster than previous approaches, with experimental validation showing improved performance.
Contribution
The paper proposes a novel policy mirror descent-primal dual algorithm that achieves a faster $ ext{O}(rac{ ext{log}(T)}{T})$ convergence rate for constrained MDPs, improving upon prior $ ext{O}(rac{1}{ ext{sqrt}(T)})$ results.
Findings
Faster $ ext{O}(rac{ ext{log}(T)}{T})$ convergence rate achieved.
Algorithm outperforms existing policy gradient methods in experiments.
Extensions handle zero constraint violation and sample-based estimation.
Abstract
We address the problem of finding the optimal policy of a constrained Markov decision process (CMDP) using a gradient descent-based algorithm. Previous results have shown that a primal-dual approach can achieve an global convergence rate for both the optimality gap and the constraint violation. We propose a new algorithm called policy mirror descent-primal dual (PMD-PD) algorithm that can provably achieve a faster convergence rate for both the optimality gap and the constraint violation. For the primal (policy) update, the PMD-PD algorithm utilizes a modified value function and performs natural policy gradient steps, which is equivalent to a mirror descent step with appropriate regularization. For the dual update, the PMD-PD algorithm uses modified Lagrange multipliers to ensure a faster convergence rate. We also present two extensions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques · Machine Learning and Algorithms
