Rule-based Shielding for Partially Observable Monte-Carlo Planning

Giulio Mazzi; Alberto Castellini; Alessandro Farinelli

arXiv:2104.13791·cs.AI·April 29, 2021

Rule-based Shielding for Partially Observable Monte-Carlo Planning

Giulio Mazzi, Alberto Castellini, Alessandro Farinelli

PDF

Open Access 1 Repo

TL;DR

This paper introduces a shielding method for POMCP that uses logical formulas derived from expert knowledge to identify and prevent unexpected actions, improving policy reliability and interpretability in POMDPs.

Contribution

It proposes a novel SMT-based approach to detect and shield unexpected actions in POMCP, enhancing policy safety and interpretability without sacrificing performance.

Findings

01

Shielded POMCP outperforms standard POMCP in benchmark tests.

02

The approach maintains effectiveness even with some incorrect logical parameters.

03

Improves policy interpretability and verification in POMDPs.

Abstract

Partially Observable Monte-Carlo Planning (POMCP) is a powerful online algorithm able to generate approximate policies for large Partially Observable Markov Decision Processes. The online nature of this method supports scalability by avoiding complete policy representation. The lack of an explicit representation however hinders policy interpretability and makes policy verification very complex. In this work, we propose two contributions. The first is a method for identifying unexpected actions selected by POMCP with respect to expert prior knowledge of the task. The second is a shielding approach that prevents POMCP from selecting unexpected actions. The first method is based on Satisfiability Modulo Theory (SMT). It inspects traces (i.e., sequences of belief-action-observation triplets) generated by POMCP to compute the parameters of logical formulas about policy properties defined by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

GiuMaz/ICAPS-2021-supmat
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference · Adversarial Robustness in Machine Learning · Reinforcement Learning in Robotics