Achieving Zero Constraint Violation for Constrained Reinforcement   Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm

Qinbo Bai; Amrit Singh Bedi; Vaneet Aggarwal

arXiv:2206.05850·cs.LG·May 20, 2024

Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm

Qinbo Bai, Amrit Singh Bedi, Vaneet Aggarwal

PDF

Open Access 1 Video

TL;DR

This paper introduces a new algorithm for constrained reinforcement learning that guarantees zero constraint violations and improves convergence and sample complexity, marking a significant advancement in solving infinite horizon CMDPs.

Contribution

The paper presents the first natural policy gradient primal-dual algorithm achieving zero constraint violation with improved sample complexity for infinite horizon CMDPs.

Findings

01

Achieves zero constraint violation in constrained RL.

02

Improves sample complexity from O(1/ε^6) to O(1/ε^4).

03

Demonstrates effectiveness through experiments.

Abstract

We consider the problem of constrained Markov decision process (CMDP) in continuous state-actions spaces where the goal is to maximize the expected cumulative reward subject to some constraints. We propose a novel Conservative Natural Policy Gradient Primal-Dual Algorithm (C-NPG-PD) to achieve zero constraint violation while achieving state of the art convergence results for the objective value function. For general policy parametrization, we prove convergence of value function to global optimal upto an approximation error due to restricted policy class. We even improve the sample complexity of existing constrained NPG-PD algorithm \cite{Ding2020} from $O (1/ ϵ^{6})$ to $O (1/ ϵ^{4})$ . To the best of our knowledge, this is the first work to establish zero constraint violation with Natural policy gradient style algorithms for infinite horizon discounted CMDPs.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm· underline

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Machine Learning and Algorithms