Responsive Safety in Reinforcement Learning by PID Lagrangian Methods
Adam Stooke, Joshua Achiam, and Pieter Abbeel

TL;DR
This paper introduces a PID-based Lagrangian method for safe reinforcement learning that improves stability and robustness, achieving state-of-the-art results in Safety Gym benchmarks.
Contribution
It proposes a novel PID-inspired Lagrangian update method that enhances learning stability and robustness in safe RL, with a new tuning approach for controller invariance.
Findings
Achieved state-of-the-art performance in Safety Gym benchmarks.
Demonstrated improved hyperparameter robustness and stability.
Provided a simple, effective method for controller tuning.
Abstract
Lagrangian methods are widely used algorithms for constrained optimization problems, but their learning dynamics exhibit oscillations and overshoot which, when applied to safe reinforcement learning, leads to constraint-violating behavior during agent training. We address this shortcoming by proposing a novel Lagrange multiplier update method that utilizes derivatives of the constraint function. We take a controls perspective, wherein the traditional Lagrange multiplier update behaves as \emph{integral} control; our terms introduce \emph{proportional} and \emph{derivative} control, achieving favorable learning dynamics through damping and predictive measures. We apply our PID Lagrangian methods in deep RL, setting a new state of the art in Safety Gym, a safe RL benchmark. Lastly, we introduce a new method to ease controller tuning by providing invariance to the relative numerical scales…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Control Systems Optimization · Model Reduction and Neural Networks · Adaptive Dynamic Programming Control
