Adaptive Primal-Dual Method for Safe Reinforcement Learning

Weiqin Chen; James Onyejizu; Long Vu; Lan Hoang; Dharmashankar; Subramanian; Koushik Kar; Sandipan Mishra; Santiago Paternain

arXiv:2402.00355·cs.LG·February 2, 2024·1 cites

Adaptive Primal-Dual Method for Safe Reinforcement Learning

Weiqin Chen, James Onyejizu, Long Vu, Lan Hoang, Dharmashankar, Subramanian, Koushik Kar, Sandipan Mishra, Santiago Paternain

PDF

Open Access

TL;DR

This paper introduces an adaptive primal-dual method for safe reinforcement learning that dynamically adjusts learning rates, leading to improved stability and performance in constrained policy optimization tasks.

Contribution

The paper proposes a novel adaptive primal-dual algorithm with theoretical guarantees and demonstrates its effectiveness over existing methods in various environments.

Findings

01

Outperforms constant LR methods in stability and performance

02

Achieves comparable or better results than state-of-the-art SRL algorithms

03

Demonstrates robustness of adaptive learning rates through empirical evidence

Abstract

Primal-dual methods have a natural application in Safe Reinforcement Learning (SRL), posed as a constrained policy optimization problem. In practice however, applying primal-dual methods to SRL is challenging, due to the inter-dependency of the learning rate (LR) and Lagrangian multipliers (dual variables) each time an embedded unconstrained RL problem is solved. In this paper, we propose, analyze and evaluate adaptive primal-dual (APD) methods for SRL, where two adaptive LRs are adjusted to the Lagrangian multipliers so as to optimize the policy in each iteration. We theoretically establish the convergence, optimality and feasibility of the APD algorithm. Finally, we conduct numerical evaluation of the practical APD algorithm with four well-known environments in Bullet-Safey-Gym employing two state-of-the-art SRL algorithms: PPO-Lagrangian and DDPG-Lagrangian. All experiments show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTraffic control and management · Extremum Seeking Control Systems · Greenhouse Technology and Climate Control