Augmented Lagrangian Multiplier Network for State-wise Safety in Reinforcement Learning
Jiaming Zhang, Yujie Yang, Yao Lyu, Shengbo Eben Li, Liping Zhang

TL;DR
This paper introduces ALaM, a stable augmented Lagrangian framework for state-wise safety constraints in reinforcement learning, improving training stability and safety performance.
Contribution
The paper proposes a novel ALaM framework with quadratic penalties and supervised multiplier training, ensuring convergence and stability in state-wise safety RL.
Findings
ALaM guarantees multiplier convergence and optimal policy recovery.
SAC-ALaM outperforms existing safe RL methods in safety and return.
Training stability is significantly improved with the proposed method.
Abstract
Safety is a primary challenge in real-world reinforcement learning (RL). Formulating safety requirements as state-wise constraints has become a prominent paradigm. Handling state-wise constraints with the Lagrangian method requires a distinct multiplier for every state, necessitating neural networks to approximate them as a multiplier network. However, applying standard dual gradient ascent to multiplier networks induces severe training oscillations. This is because the inherent instability of dual ascent is exacerbated by network generalization -- local overshoots and delayed updates propagate to adjacent states, further amplifying policy fluctuations. Existing stabilization techniques are designed for scalar multipliers, which are inadequate for state-dependent multiplier networks. To address this challenge, we propose an augmented Lagrangian multiplier network (ALaM) framework for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
