Approximate Model-Based Shielding for Safe Reinforcement Learning
Alexander W. Goodall, Francesco Belardinelli

TL;DR
This paper introduces approximate model-based shielding (AMBS), a novel safety verification method for reinforcement learning that does not need prior knowledge of safety dynamics, showing improved performance on Atari benchmarks.
Contribution
The paper presents AMBS, a new look-ahead shielding algorithm for safe RL that operates without prior safety dynamics knowledge, with strong theoretical backing.
Findings
AMBS outperforms existing safety-aware methods on Atari games.
AMBS provides theoretical guarantees for safety performance.
The approach is applicable without prior safety system knowledge.
Abstract
Reinforcement learning (RL) has shown great potential for solving complex tasks in a variety of domains. However, applying RL to safety-critical systems in the real-world is not easy as many algorithms are sample-inefficient and maximising the standard RL objective comes with no guarantees on worst-case performance. In this paper we propose approximate model-based shielding (AMBS), a principled look-ahead shielding algorithm for verifying the performance of learned RL policies w.r.t. a set of given safety constraints. Our algorithm differs from other shielding approaches in that it does not require prior knowledge of the safety-relevant dynamics of the system. We provide a strong theoretical justification for AMBS and demonstrate superior performance to other safety-aware approaches on a set of Atari games with state-dependent safety-labels.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI)
