Generating Local Shields for Decentralised Partially Observable Markov Decision Processes
Haoran Yang (University of Oxford), Nobuko Yoshida (University of Oxford)

TL;DR
This paper introduces a novel method for generating local safety shields in decentralized partially observable Markov decision processes, improving multi-agent safety without requiring shared global state.
Contribution
It develops a process algebra-based approach to specify and compile safe joint behaviors into local shields, integrating with PRISM for safety probability analysis.
Findings
Substantially reduces collisions in multi-agent path-finding case study
Creates belief-based local Mealy machines for safe action filtering
Demonstrates varying levels of conservatism and expressiveness in shields
Abstract
Multi-agent systems under partial observation often struggle to maintain safety because each agent's locally chosen action does not, in general, determine the resulting joint action. Shielding addresses this by filtering actions based on the current state, but most existing techniques either assume access to a shared centralised global state or employ memoryless local filters that cannot consider interaction history. We introduce a shield process algebra with guarded choice and recursion for specifying safe global behaviour in communication-free Dec-POMDP settings. From a shield process, we compile a process automaton, then a global Mealy machine as a safe joint-action filter, and finally project it to local Mealy machines whose states are belief-style subsets of the global Mealy machine states consistent with each agent's observations, and which output per-agent safe action sets.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
