Rule-based Shielding for Partially Observable Monte-Carlo Planning
Giulio Mazzi, Alberto Castellini, Alessandro Farinelli

TL;DR
This paper introduces a shielding method for POMCP that uses logical formulas derived from expert knowledge to identify and prevent unexpected actions, improving policy reliability and interpretability in POMDPs.
Contribution
It proposes a novel SMT-based approach to detect and shield unexpected actions in POMCP, enhancing policy safety and interpretability without sacrificing performance.
Findings
Shielded POMCP outperforms standard POMCP in benchmark tests.
The approach maintains effectiveness even with some incorrect logical parameters.
Improves policy interpretability and verification in POMDPs.
Abstract
Partially Observable Monte-Carlo Planning (POMCP) is a powerful online algorithm able to generate approximate policies for large Partially Observable Markov Decision Processes. The online nature of this method supports scalability by avoiding complete policy representation. The lack of an explicit representation however hinders policy interpretability and makes policy verification very complex. In this work, we propose two contributions. The first is a method for identifying unexpected actions selected by POMCP with respect to expert prior knowledge of the task. The second is a shielding approach that prevents POMCP from selecting unexpected actions. The first method is based on Satisfiability Modulo Theory (SMT). It inspects traces (i.e., sequences of belief-action-observation triplets) generated by POMCP to compute the parameters of logical formulas about policy properties defined by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Adversarial Robustness in Machine Learning · Reinforcement Learning in Robotics
