Augmented Lagrangian Method for Last-Iterate Convergence for Constrained MDPs
Michael Lu, Max Qiushi Lin, Mo Chen, Sharan Vaswani

TL;DR
This paper introduces a practical augmented Lagrangian framework with provable last-iterate convergence for constrained MDPs, applicable to complex policies and continuous control tasks.
Contribution
It extends last-iterate convergence guarantees from tabular to non-linear policies using an inexact augmented Lagrangian approach with projected Q-ascent.
Findings
Proposed a scalable framework for constrained policy optimization.
Achieved last-iterate convergence in complex, non-linear policy settings.
Validated the approach on continuous control tasks.
Abstract
We study policy optimization for infinite-horizon, discounted constrained Markov decision processes (CMDPs). While existing theoretical guarantees typically hold for the mixture policy, deploying such a policy is computationally and memory intensive. This leads to a practical mismatch where a single (last-iterate) policy must be deployed. Recent theoretical works have thus focused on proving last-iterate convergence, but are largely limited to the tabular setting or to algorithmic variants that are rarely used in practice. To address this, we use the classic inexact augmented Lagrangian () method from constrained optimization, and propose a general framework with provable last-iterate convergence for CMDPs. We first focus on the tabular setting and propose to solve the sub-problem with projected Q-ascent (). Combining the theoretical guarantees…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
