Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm
Qinbo Bai, Amrit Singh Bedi, Vaneet Aggarwal

TL;DR
This paper introduces a new algorithm for constrained reinforcement learning that guarantees zero constraint violations and improves convergence and sample complexity, marking a significant advancement in solving infinite horizon CMDPs.
Contribution
The paper presents the first natural policy gradient primal-dual algorithm achieving zero constraint violation with improved sample complexity for infinite horizon CMDPs.
Findings
Achieves zero constraint violation in constrained RL.
Improves sample complexity from O(1/ε^6) to O(1/ε^4).
Demonstrates effectiveness through experiments.
Abstract
We consider the problem of constrained Markov decision process (CMDP) in continuous state-actions spaces where the goal is to maximize the expected cumulative reward subject to some constraints. We propose a novel Conservative Natural Policy Gradient Primal-Dual Algorithm (C-NPG-PD) to achieve zero constraint violation while achieving state of the art convergence results for the objective value function. For general policy parametrization, we prove convergence of value function to global optimal upto an approximation error due to restricted policy class. We even improve the sample complexity of existing constrained NPG-PD algorithm \cite{Ding2020} from to . To the best of our knowledge, this is the first work to establish zero constraint violation with Natural policy gradient style algorithms for infinite horizon discounted CMDPs.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Machine Learning and Algorithms
