Automata Learning meets Shielding
Martin Tappler, Stefan Pranger, Bettina K\"onighofer, Edi, Mu\v{s}kardin, Roderick Bloem, Kim Larsen

TL;DR
This paper presents an iterative method combining automata learning and shield synthesis to prevent safety violations in reinforcement learning agents exploring unknown environments.
Contribution
It introduces a novel approach that learns environment models and constructs safety shields during exploration to ensure safety in RL.
Findings
Shields effectively prevent safety violations during exploration.
Iterative learning improves shield accuracy over time.
Method applied successfully to slippery Gridworlds case study.
Abstract
Safety is still one of the major research challenges in reinforcement learning (RL). In this paper, we address the problem of how to avoid safety violations of RL agents during exploration in probabilistic and partially unknown environments. Our approach combines automata learning for Markov Decision Processes (MDPs) and shield synthesis in an iterative approach. Initially, the MDP representing the environment is unknown. The agent starts exploring the environment and collects traces. From the collected traces, we passively learn MDPs that abstractly represent the safety-relevant aspects of the environment. Given a learned MDP and a safety specification, we construct a shield. For each state-action pair within a learned MDP, the shield computes exact probabilities on how likely it is that executing the action results in violating the specification from the current state within the next…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Optimization and Search Problems
MethodsQ-Learning
