Towards a Practical Understanding of Lagrangian Methods in Safe Reinforcement Learning
Lindsay Spoor, \'Alvaro Serra-G\'omez, Aske Plaat, Thomas Moerland

TL;DR
This paper empirically analyzes Lagrangian methods in safe reinforcement learning, revealing the sensitivity of the Lagrange multiplier and the importance of cost limit selection, supported by Pareto frontiers and an open-source benchmark.
Contribution
It provides a systematic empirical study of the trade-offs in safe RL, introduces Pareto frontiers for visualization, and offers guidelines for cost limit selection and an open-source code base.
Findings
Lagrange multiplier sensitivity varies across tasks and regimes.
Cost restrictiveness can differ within the same task.
Careful cost limit selection is crucial for evaluating safe RL methods.
Abstract
Safe reinforcement learning addresses constrained optimization problems where maximizing performance must be balanced against safety constraints, and Lagrangian methods are a widely used approach for this purpose. However, the effectiveness of Lagrangian methods depends crucially on the choice of the Lagrange multiplier , which governs the multi-objective trade-off between return and cost. A common practice is to update the multiplier automatically during training. Although this approach is standard in practice, there remains limited empirical evidence on the optimally achievable trade-off between return and cost as a function of , and there is currently no systematic benchmark comparing automated update mechanisms to this empirical optimum. Therefore, we study (i) the constraint geometry for eight widely used safety tasks and (ii) the previously overlooked…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Advanced Bandit Algorithms Research
