Global Convergence of Average Reward Constrained MDPs with Neural Critic and General Policy Parameterization

Anirudh Satheesh; Pankaj Kumar Barman; Washim Uddin Mondal; Vaneet Aggarwal

arXiv:2603.07698·cs.LG·March 10, 2026

Global Convergence of Average Reward Constrained MDPs with Neural Critic and General Policy Parameterization

Anirudh Satheesh, Pankaj Kumar Barman, Washim Uddin Mondal, Vaneet Aggarwal

PDF

Open Access

TL;DR

This paper proves global convergence for a neural network-based actor-critic algorithm in constrained Markov decision processes, extending theoretical guarantees to high-dimensional, continuous control settings.

Contribution

It introduces a primal-dual natural actor-critic algorithm with neural critics and establishes the first global convergence guarantees for CMDPs with general policies and neural network critics.

Findings

01

Achieves $ ilde{O}(T^{-1/4})$ convergence rate.

02

Provides the first theoretical guarantees for neural critic-based CMDPs.

03

Extends actor-critic analysis beyond linear critics.

Abstract

We study infinite-horizon Constrained Markov Decision Processes (CMDPs) with general policy parameterizations and multi-layer neural network critics. Existing theoretical analyses for constrained reinforcement learning largely rely on tabular policies or linear critics, which limits their applicability to high-dimensional and continuous control problems. We propose a primal-dual natural actor-critic algorithm that integrates neural critic estimation with natural policy gradient updates and leverages Neural Tangent Kernel (NTK) theory to control function-approximation error under Markovian sampling, without requiring access to mixing-time oracles. We establish global convergence and cumulative constraint violation rates of $\tilde{O} (T^{-} 1/4)$ up to approximation errors induced by the policy and critic classes. Our results provide the first such guarantees for CMDPs with general…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Adversarial Robustness in Machine Learning