Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning
Alessandro Montenegro, Marco Mussi, Matteo Papini, Alberto, Maria Metelli

TL;DR
This paper introduces a new gradient-based primal-dual framework for constrained reinforcement learning, providing global convergence guarantees and extending to risk-based constraints, with empirical validation on control problems.
Contribution
It proposes a novel exploration-agnostic algorithm C-PG with last-iterate convergence guarantees and extends it to risk-sensitive constraints in CRL.
Findings
C-PG achieves global last-iterate convergence under weak assumptions.
The algorithms effectively handle risk-based constraints.
Numerical results outperform state-of-the-art baselines.
Abstract
Constrained Reinforcement Learning (CRL) tackles sequential decision-making problems where agents are required to achieve goals by maximizing the expected return while meeting domain-specific constraints, which are often formulated as expected costs. In this setting, policy-based methods are widely used since they come with several advantages when dealing with continuous-control problems. These methods search in the policy space with an action-based or parameter-based exploration strategy, depending on whether they learn directly the parameters of a stochastic policy or those of a stochastic hyperpolicy. In this paper, we propose a general framework for addressing CRL problems via gradient-based primal-dual algorithms, relying on an alternate ascent/descent scheme with dual-variable regularization. We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Age of Information Optimization
