Interpreting Primal-Dual Algorithms for Constrained Multiagent Reinforcement Learning
Daniel Tabas, Ahmed S. Zamzam, Baosen Zhang

TL;DR
This paper analyzes primal-dual algorithms in constrained multiagent reinforcement learning, revealing how penalty terms influence safety and value estimation, and proposes an improved algorithm with better safety guarantees and convergence.
Contribution
It provides a theoretical reinterpretation of primal-dual methods as probabilistic constraints and introduces a novel value estimation technique for safer, faster learning.
Findings
Standard penalty leads to weak safety guarantees.
Modified penalties enforce meaningful probabilistic safety constraints.
Proposed value estimate accelerates convergence to safe policies.
Abstract
Constrained multiagent reinforcement learning (C-MARL) is gaining importance as MARL algorithms find new applications in real-world systems ranging from energy systems to drone swarms. Most C-MARL algorithms use a primal-dual approach to enforce constraints through a penalty function added to the reward. In this paper, we study the structural effects of this penalty term on the MARL problem. First, we show that the standard practice of using the constraint function as the penalty leads to a weak notion of safety. However, by making simple modifications to the penalty term, we can enforce meaningful probabilistic (chance and conditional value at risk) constraints. Second, we quantify the effect of the penalty term on the value function, uncovering an improved value estimation procedure. We use these insights to propose a constrained multiagent advantage actor critic (C-MAA2C) algorithm.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning
