Safe POMDP Online Planning via Shielding

Shili Sheng; David Parker; Lu Feng

arXiv:2309.10216·cs.AI·March 5, 2024

Safe POMDP Online Planning via Shielding

Shili Sheng, David Parker, Lu Feng

PDF

Open Access

TL;DR

This paper introduces shielding techniques integrated into POMCP for safe online planning in POMDPs, ensuring safety guarantees in large, uncertain environments with minimal runtime impact.

Contribution

It proposes four novel shielding methods for POMCP that guarantee safety in POMDPs with almost-sure reach-avoid specifications, including scalable factored variants.

Findings

01

Guarantees safety in large POMDPs

02

Minimal impact on online planning runtime

03

Effective shielding methods demonstrated on benchmarks

Abstract

Partially observable Markov decision processes (POMDPs) have been widely used in many robotic applications for sequential decision-making under uncertainty. POMDP online planning algorithms such as Partially Observable Monte-Carlo Planning (POMCP) can solve very large POMDPs with the goal of maximizing the expected return. But the resulting policies cannot provide safety guarantees which are imperative for real-world safety-critical tasks (e.g., autonomous driving). In this work, we consider safety requirements represented as almost-sure reach-avoid specifications (i.e., the probability to reach a set of goal states is one and the probability to reach a set of unsafe states is zero). We compute shields that restrict unsafe actions which would violate the almost-sure reach-avoid specifications. We then integrate these shields into the POMCP algorithm for safe POMDP online planning. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Software Testing and Debugging Techniques · Web Application Security Vulnerabilities