TL;DR
This paper introduces a model-free reinforcement learning algorithm that synthesizes control policies to maximize the probability of satisfying linear temporal logic specifications in uncertain environments, using a novel automaton construction.
Contribution
It proposes a new embedded LDGBA construction with a synchronous tracking-frontier function for efficient RL-based LTL control synthesis.
Findings
The method guarantees optimal satisfaction probability.
Effective in simulation and real-world experiments.
Handles environment and motion uncertainties.
Abstract
This paper presents a model-free reinforcement learning (RL) algorithm to synthesize a control policy that maximizes the satisfaction probability of linear temporal logic (LTL) specifications. Due to the consideration of environment and motion uncertainties, we model the robot motion as a probabilistic labeled Markov decision process with unknown transition probabilities and unknown probabilistic label functions. The LTL task specification is converted to a limit deterministic generalized B\"uchi automaton (LDGBA) with several accepting sets to maintain dense rewards during learning. The novelty of applying LDGBA is to construct an embedded LDGBA (E-LDGBA) by designing a synchronous tracking-frontier function, which enables the record of non-visited accepting sets without increasing dimensional and computational complexity. With appropriate dependent reward and discount functions,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
