Automata Learning meets Shielding

Martin Tappler; Stefan Pranger; Bettina K\"onighofer; Edi; Mu\v{s}kardin; Roderick Bloem; Kim Larsen

arXiv:2212.01838·cs.LG·December 6, 2022

Automata Learning meets Shielding

Martin Tappler, Stefan Pranger, Bettina K\"onighofer, Edi, Mu\v{s}kardin, Roderick Bloem, Kim Larsen

PDF

Open Access 1 Repo

TL;DR

This paper presents an iterative method combining automata learning and shield synthesis to prevent safety violations in reinforcement learning agents exploring unknown environments.

Contribution

It introduces a novel approach that learns environment models and constructs safety shields during exploration to ensure safety in RL.

Findings

01

Shields effectively prevent safety violations during exploration.

02

Iterative learning improves shield accuracy over time.

03

Method applied successfully to slippery Gridworlds case study.

Abstract

Safety is still one of the major research challenges in reinforcement learning (RL). In this paper, we address the problem of how to avoid safety violations of RL agents during exploration in probabilistic and partially unknown environments. Our approach combines automata learning for Markov Decision Processes (MDPs) and shield synthesis in an iterative approach. Initially, the MDP representing the environment is unknown. The agent starts exploring the environment and collects traces. From the collected traces, we passively learn MDPs that abstractly represent the safety-relevant aspects of the environment. Given a learned MDP and a safety specification, we construct a shield. For each state-action pair within a learned MDP, the shield computes exact probabilities on how likely it is that executing the action results in violating the specification from the current state within the next…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

des-lab/automata-learning-meets-shielding
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Optimization and Search Problems

MethodsQ-Learning