Think Smart, Act SMARL! Analyzing Probabilistic Logic Shields for Multi-Agent Reinforcement Learning
Satchit Chatterji, Erman Acar

TL;DR
This paper introduces SMARL, a framework that extends probabilistic logic shields to multi-agent reinforcement learning, improving safety and cooperation in complex multi-agent environments.
Contribution
The paper proposes novel probabilistic logic-based methods for multi-agent RL, including a new update rule and policy gradient approach with safety guarantees.
Findings
Fewer constraint violations in multi-agent benchmarks
Enhanced cooperation under normative constraints
Effective for equilibrium selection in multi-agent systems
Abstract
Safe reinforcement learning (RL) is crucial for real-world applications, and multi-agent interactions introduce additional safety challenges. While Probabilistic Logic Shields (PLS) has been a powerful proposal to enforce safety in single-agent RL, their generalizability to multi-agent settings remains unexplored. In this paper, we address this gap by conducting extensive analyses of PLS within decentralized, multi-agent environments, and in doing so, propose as a general framework for steering MARL towards norm-compliant outcomes. Our key contributions are: (1) a novel Probabilistic Logic Temporal Difference (PLTD) update for shielded, independent Q-learning, which incorporates probabilistic constraints directly into the value update process; (2) a probabilistic logic policy gradient method for shielded PPO with formal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSafety Systems Engineering in Autonomy
MethodsEntropy Regularization · Q-Learning · Proximal Policy Optimization
