Reinforcement Learning for Temporal Logic Control Synthesis with   Probabilistic Satisfaction Guarantees

Mohammadhosein Hasanbeig; Yiannis Kantaros; Alessandro Abate; Daniel; Kroening; George J. Pappas; Insup Lee

arXiv:1909.05304·cs.LO·September 13, 2019

Reinforcement Learning for Temporal Logic Control Synthesis with Probabilistic Satisfaction Guarantees

Mohammadhosein Hasanbeig, Yiannis Kantaros, Alessandro Abate, Daniel, Kroening, George J. Pappas, Insup Lee

PDF

1 Repo

TL;DR

This paper introduces a model-free reinforcement learning approach to synthesize control policies that maximize the probability of satisfying Linear Temporal Logic specifications in uncertain, probabilistically-labeled environments, ensuring probabilistic guarantees.

Contribution

It presents a novel RL algorithm that handles probabilistic uncertainties and unknown environment structures for LTL control synthesis, with theoretical guarantees on satisfaction probability.

Findings

01

The RL algorithm asymptotically maximizes satisfaction probability.

02

Experimental results demonstrate the method's efficiency.

03

The approach effectively manages uncertainties in workspace and actions.

Abstract

Reinforcement Learning (RL) has emerged as an efficient method of choice for solving complex sequential decision making problems in automatic control, computer science, economics, and biology. In this paper we present a model-free RL algorithm to synthesize control policies that maximize the probability of satisfying high-level control objectives given as Linear Temporal Logic (LTL) formulas. Uncertainty is considered in the workspace properties, the structure of the workspace, and the agent actions, giving rise to a Probabilistically-Labeled Markov Decision Process (PL-MDP) with unknown graph structure and stochastic behaviour, which is even more general case than a fully unknown MDP. We first translate the LTL specification into a Limit Deterministic Buchi Automaton (LDBA), which is then used in an on-the-fly product with the PL-MDP. Thereafter, we define a synchronous reward function…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

grockious/lcrl
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.