Global Convergence of Average Reward Constrained MDPs with Neural Critic and General Policy Parameterization
Anirudh Satheesh, Pankaj Kumar Barman, Washim Uddin Mondal, Vaneet Aggarwal

TL;DR
This paper proves global convergence for a neural network-based actor-critic algorithm in constrained Markov decision processes, extending theoretical guarantees to high-dimensional, continuous control settings.
Contribution
It introduces a primal-dual natural actor-critic algorithm with neural critics and establishes the first global convergence guarantees for CMDPs with general policies and neural network critics.
Findings
Achieves $ ilde{O}(T^{-1/4})$ convergence rate.
Provides the first theoretical guarantees for neural critic-based CMDPs.
Extends actor-critic analysis beyond linear critics.
Abstract
We study infinite-horizon Constrained Markov Decision Processes (CMDPs) with general policy parameterizations and multi-layer neural network critics. Existing theoretical analyses for constrained reinforcement learning largely rely on tabular policies or linear critics, which limits their applicability to high-dimensional and continuous control problems. We propose a primal-dual natural actor-critic algorithm that integrates neural critic estimation with natural policy gradient updates and leverages Neural Tangent Kernel (NTK) theory to control function-approximation error under Markovian sampling, without requiring access to mixing-time oracles. We establish global convergence and cumulative constraint violation rates of up to approximation errors induced by the policy and critic classes. Our results provide the first such guarantees for CMDPs with general…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Adversarial Robustness in Machine Learning
