Last-Iterate Global Convergence of Policy Gradients for Constrained   Reinforcement Learning

Alessandro Montenegro; Marco Mussi; Matteo Papini; Alberto; Maria Metelli

arXiv:2407.10775·cs.LG·November 13, 2024

Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning

Alessandro Montenegro, Marco Mussi, Matteo Papini, Alberto, Maria Metelli

PDF

Open Access 1 Video

TL;DR

This paper introduces a new gradient-based primal-dual framework for constrained reinforcement learning, providing global convergence guarantees and extending to risk-based constraints, with empirical validation on control problems.

Contribution

It proposes a novel exploration-agnostic algorithm C-PG with last-iterate convergence guarantees and extends it to risk-sensitive constraints in CRL.

Findings

01

C-PG achieves global last-iterate convergence under weak assumptions.

02

The algorithms effectively handle risk-based constraints.

03

Numerical results outperform state-of-the-art baselines.

Abstract

Constrained Reinforcement Learning (CRL) tackles sequential decision-making problems where agents are required to achieve goals by maximizing the expected return while meeting domain-specific constraints, which are often formulated as expected costs. In this setting, policy-based methods are widely used since they come with several advantages when dealing with continuous-control problems. These methods search in the policy space with an action-based or parameter-based exploration strategy, depending on whether they learn directly the parameters of a stochastic policy or those of a stochastic hyperpolicy. In this paper, we propose a general framework for addressing CRL problems via gradient-based primal-dual algorithms, relying on an alternate ascent/descent scheme with dual-variable regularization. We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Age of Information Optimization