Safe Multi-Agent Reinforcement Learning via Shielding
Ingy Elsayed-Aly, Suda Bharadwaj, Christopher Amato, R\"udiger Ehlers,, Ufuk Topcu, Lu Feng

TL;DR
This paper introduces two shielding methods for safe multi-agent reinforcement learning that guarantee safety during training without sacrificing policy quality, with factored shielding being more scalable.
Contribution
It proposes centralized and factored shielding approaches to ensure safety in MARL, a novel contribution to safety guarantees in multi-agent systems.
Findings
Both shielding methods guarantee safety during learning.
Factored shielding is more scalable with the number of agents.
Safety guarantees do not compromise policy quality.
Abstract
Multi-agent reinforcement learning (MARL) has been increasingly used in a wide range of safety-critical applications, which require guaranteed safety (e.g., no unsafe states are ever visited) during the learning process.Unfortunately, current MARL methods do not have safety guarantees. Therefore, we present two shielding approaches for safe MARL. In centralized shielding, we synthesize a single shield to monitor all agents' joint actions and correct any unsafe action if necessary. In factored shielding, we synthesize multiple shields based on a factorization of the joint state space observed by all agents; the set of shields monitors agents concurrently and each shield is only responsible for a subset of agents at each step.Experimental results show that both approaches can guarantee the safety of agents during learning without compromising the quality of learned policies; moreover,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFormal Methods in Verification · Adversarial Robustness in Machine Learning · Reinforcement Learning in Robotics
