Think Smart, Act SMARL! Analyzing Probabilistic Logic Shields for Multi-Agent Reinforcement Learning

Satchit Chatterji; Erman Acar

arXiv:2411.04867·cs.AI·August 28, 2025

Think Smart, Act SMARL! Analyzing Probabilistic Logic Shields for Multi-Agent Reinforcement Learning

Satchit Chatterji, Erman Acar

PDF

Open Access 1 Repo

TL;DR

This paper introduces SMARL, a framework that extends probabilistic logic shields to multi-agent reinforcement learning, improving safety and cooperation in complex multi-agent environments.

Contribution

The paper proposes novel probabilistic logic-based methods for multi-agent RL, including a new update rule and policy gradient approach with safety guarantees.

Findings

01

Fewer constraint violations in multi-agent benchmarks

02

Enhanced cooperation under normative constraints

03

Effective for equilibrium selection in multi-agent systems

Abstract

Safe reinforcement learning (RL) is crucial for real-world applications, and multi-agent interactions introduce additional safety challenges. While Probabilistic Logic Shields (PLS) has been a powerful proposal to enforce safety in single-agent RL, their generalizability to multi-agent settings remains unexplored. In this paper, we address this gap by conducting extensive analyses of PLS within decentralized, multi-agent environments, and in doing so, propose $Shielded Multi-Agent Reinforcement Learning (SMARL)$ as a general framework for steering MARL towards norm-compliant outcomes. Our key contributions are: (1) a novel Probabilistic Logic Temporal Difference (PLTD) update for shielded, independent Q-learning, which incorporates probabilistic constraints directly into the value update process; (2) a probabilistic logic policy gradient method for shielded PPO with formal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

satchitchatterji/shieldedmarlthesis
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSafety Systems Engineering in Autonomy

MethodsEntropy Regularization · Q-Learning · Proximal Policy Optimization