Control Synthesis from Linear Temporal Logic Specifications using Model-Free Reinforcement Learning
Alper Kamil Bozkurt, Yu Wang, Michael M. Zavlanos, Miroslav Pajic

TL;DR
This paper introduces a model-free reinforcement learning framework for synthesizing control policies that satisfy linear temporal logic specifications in unknown stochastic environments, ensuring convergence to optimal policies.
Contribution
It proposes a novel reward and discounting mechanism based on LTL, enabling RL to maximize satisfaction probabilities without learning environment models.
Findings
The RL approach guarantees convergence to optimal policies satisfying LTL.
The method is demonstrated on two motion planning case studies.
Abstract
We present a reinforcement learning (RL) framework to synthesize a control policy from a given linear temporal logic (LTL) specification in an unknown stochastic environment that can be modeled as a Markov Decision Process (MDP). Specifically, we learn a policy that maximizes the probability of satisfying the LTL formula without learning the transition probabilities. We introduce a novel rewarding and path-dependent discounting mechanism based on the LTL formula such that (i) an optimal policy maximizing the total discounted reward effectively maximizes the probabilities of satisfying LTL objectives, and (ii) a model-free RL algorithm using these rewards and discount factors is guaranteed to converge to such policy. Finally, we illustrate the applicability of our RL-based synthesis approach on two motion planning case studies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
