Augmented Lagrangian Method for Last-Iterate Convergence for Constrained MDPs

Michael Lu; Max Qiushi Lin; Mo Chen; Sharan Vaswani

arXiv:2605.11694·cs.LG·May 13, 2026

Augmented Lagrangian Method for Last-Iterate Convergence for Constrained MDPs

Michael Lu, Max Qiushi Lin, Mo Chen, Sharan Vaswani

PDF

TL;DR

This paper introduces a practical augmented Lagrangian framework with provable last-iterate convergence for constrained MDPs, applicable to complex policies and continuous control tasks.

Contribution

It extends last-iterate convergence guarantees from tabular to non-linear policies using an inexact augmented Lagrangian approach with projected Q-ascent.

Findings

01

Proposed a scalable framework for constrained policy optimization.

02

Achieved last-iterate convergence in complex, non-linear policy settings.

03

Validated the approach on continuous control tasks.

Abstract

We study policy optimization for infinite-horizon, discounted constrained Markov decision processes (CMDPs). While existing theoretical guarantees typically hold for the mixture policy, deploying such a policy is computationally and memory intensive. This leads to a practical mismatch where a single (last-iterate) policy must be deployed. Recent theoretical works have thus focused on proving last-iterate convergence, but are largely limited to the tabular setting or to algorithmic variants that are rarely used in practice. To address this, we use the classic inexact augmented Lagrangian ( $AL$ ) method from constrained optimization, and propose a general framework with provable last-iterate convergence for CMDPs. We first focus on the tabular setting and propose to solve the $AL$ sub-problem with projected Q-ascent ( $PQA$ ). Combining the theoretical guarantees…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.