Last-Iterate Convergence of General Parameterized Policies in Constrained MDPs

Washim Uddin Mondal; Vaneet Aggarwal

arXiv:2408.11513·cs.LG·May 4, 2026

Last-Iterate Convergence of General Parameterized Policies in Constrained MDPs

Washim Uddin Mondal, Vaneet Aggarwal

PDF

TL;DR

This paper introduces a new primal-dual accelerated natural policy gradient algorithm for constrained MDPs, achieving improved last-iterate convergence guarantees with specific sample complexities depending on policy class completeness.

Contribution

It proposes the PDR-ANPG algorithm with entropy and quadratic regularizers, providing the first last-iterate convergence guarantees for general parameterized policies in CMDPs.

Findings

01

Achieves last-iterate $ ilde{O}(rac{1}{ ext{epsilon}^4})$ sample complexity for complete policies.

02

Reduces sample complexity to $ ilde{O}(rac{1}{ ext{epsilon}^2})$ when the policy class is incomplete.

03

Improves upon existing state-of-the-art guarantees for parameterized CMDPs.

Abstract

This paper focuses on learning a Constrained Markov Decision Process (CMDP) via general parameterized policies. We propose a Primal-Dual based Regularized Accelerated Natural Policy Gradient (PDR-ANPG) algorithm that uses entropy and quadratic regularizers to reach this goal. For parameterized policy classes with a transferred compatibility approximation error, $ϵ_{bias}$ , PDR-ANPG achieves a last-iterate $ϵ$ optimality gap and $ϵ$ constraint violation with a sample complexity of $\tilde{O} (ϵ^{- 2} min {ϵ^{- 2}, ϵ_{bias}^{- \frac{1}{3}}})$ . If the class is incomplete ( $ϵ_{bias} > 0$ ), then the sample complexity reduces to $\tilde{O} (ϵ^{- 2})$ for $ϵ < (ϵ_{bias})^{\frac{1}{6}}$ . Moreover, for complete policies with $ϵ_{bias} = 0$ , our algorithm achieves a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.