Safe POMDP Online Planning via Shielding
Shili Sheng, David Parker, Lu Feng

TL;DR
This paper introduces shielding techniques integrated into POMCP for safe online planning in POMDPs, ensuring safety guarantees in large, uncertain environments with minimal runtime impact.
Contribution
It proposes four novel shielding methods for POMCP that guarantee safety in POMDPs with almost-sure reach-avoid specifications, including scalable factored variants.
Findings
Guarantees safety in large POMDPs
Minimal impact on online planning runtime
Effective shielding methods demonstrated on benchmarks
Abstract
Partially observable Markov decision processes (POMDPs) have been widely used in many robotic applications for sequential decision-making under uncertainty. POMDP online planning algorithms such as Partially Observable Monte-Carlo Planning (POMCP) can solve very large POMDPs with the goal of maximizing the expected return. But the resulting policies cannot provide safety guarantees which are imperative for real-world safety-critical tasks (e.g., autonomous driving). In this work, we consider safety requirements represented as almost-sure reach-avoid specifications (i.e., the probability to reach a set of goal states is one and the probability to reach a set of unsafe states is zero). We compute shields that restrict unsafe actions which would violate the almost-sure reach-avoid specifications. We then integrate these shields into the POMCP algorithm for safe POMDP online planning. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Software Testing and Debugging Techniques · Web Application Security Vulnerabilities
