Online Shielding for Stochastic Systems
Bettina K\"onighofer, Julian Rudolf, Alexander Palmisano, Martin, Tappler, Roderick Bloem

TL;DR
This paper introduces an online synthesis algorithm for runtime safety enforcers, called shields, in reinforcement learning systems, enabling real-time safety guarantees in stochastic environments.
Contribution
It presents a novel online shield synthesis method that reduces offline computation and memory needs, applicable to diverse planning problems with stochastic dynamics.
Findings
Effective safety enforcement in a stochastic Snake game.
Reduced offline computation compared to traditional methods.
Real-time safety guarantees demonstrated in a complex game environment.
Abstract
In this paper, we propose a method to develop trustworthy reinforcement learning systems. To ensure safety especially during exploration, we automatically synthesize a correct-by-construction runtime enforcer, called a shield, that blocks all actions that are unsafe with respect to a temporal logic specification from the agent. Our main contribution is a new synthesis algorithm for computing the shield online. Existing offline shielding approaches compute exhaustively the safety of all states-action combinations ahead-of-time, resulting in huge offline computation times, large memory consumption, and significant delays at run-time due to the look-ups in a huge database. The intuition behind online shielding is to compute during run-time the set of all states that could be reached in the near future. For each of these states, the safety of all available actions is analysed and used for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
