Responsive Safety in Reinforcement Learning by PID Lagrangian Methods

Adam Stooke; Joshua Achiam; and Pieter Abbeel

arXiv:2007.03964·math.OC·July 9, 2020·44 cites

Responsive Safety in Reinforcement Learning by PID Lagrangian Methods

Adam Stooke, Joshua Achiam, and Pieter Abbeel

PDF

Open Access 1 Video

TL;DR

This paper introduces a PID-based Lagrangian method for safe reinforcement learning that improves stability and robustness, achieving state-of-the-art results in Safety Gym benchmarks.

Contribution

It proposes a novel PID-inspired Lagrangian update method that enhances learning stability and robustness in safe RL, with a new tuning approach for controller invariance.

Findings

01

Achieved state-of-the-art performance in Safety Gym benchmarks.

02

Demonstrated improved hyperparameter robustness and stability.

03

Provided a simple, effective method for controller tuning.

Abstract

Lagrangian methods are widely used algorithms for constrained optimization problems, but their learning dynamics exhibit oscillations and overshoot which, when applied to safe reinforcement learning, leads to constraint-violating behavior during agent training. We address this shortcoming by proposing a novel Lagrange multiplier update method that utilizes derivatives of the constraint function. We take a controls perspective, wherein the traditional Lagrange multiplier update behaves as \emph{integral} control; our terms introduce \emph{proportional} and \emph{derivative} control, achieving favorable learning dynamics through damping and predictive measures. We apply our PID Lagrangian methods in deep RL, setting a new state of the art in Safety Gym, a safe RL benchmark. Lastly, we introduce a new method to ease controller tuning by providing invariance to the relative numerical scales…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Responsive Safety in Reinforcement Learning by PID Lagrangian Methods· slideslive

Taxonomy

TopicsAdvanced Control Systems Optimization · Model Reduction and Neural Networks · Adaptive Dynamic Programming Control