Learning-based Bounded Synthesis for Semi-MDPs with LTL Specifications
Ryohei Oura, Toshimitsu Ushio

TL;DR
This paper introduces a learning-based bounded synthesis approach for semi-Markov decision processes with LTL specifications, combining reinforcement learning and Bayesian inference to synthesize optimal policies with convergence guarantees.
Contribution
It presents a novel method integrating learning and formal synthesis for SMDPs with LTL, enabling policy synthesis with probabilistic and temporal guarantees.
Findings
Converges to optimal policy as data increases
Maximizes reachability probability of satisfying LTL
Minimizes long-term dwell time risk
Abstract
This letter proposes a learning-based bounded synthesis for a semi-Markov decision process (SMDP) with a linear temporal logic (LTL) specification. In the product of the SMDP and the deterministic -co-B\"uchi automaton (dcBA) converted from the LTL specification, we learn both the winning region of satisfying the LTL specification and the dynamics therein based on reinforcement learning and Bayesian inference. Then, we synthesize an optimal policy satisfying the following two conditions. (1) It maximizes the probability of reaching the wining region. (2) It minimizes a long-term risk for the dwell time within the winning region. The minimization of the long-term risk is done based on the estimated dynamics and a value iteration. We show that, if the discount factor is sufficiently close to one, the synthesized policy converges to the optimal policy as the number of the data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFormal Methods in Verification
