TL;DR
This paper presents a novel end-to-end framework integrating linear temporal logic with differentiable simulators, enabling efficient gradient-based reinforcement learning from formal specifications in continuous control tasks.
Contribution
It introduces a differentiable reward mechanism for LTL, connecting formal methods with deep RL, and provides theoretical guarantees and empirical improvements in training speed and performance.
Findings
Accelerates training in complex control tasks
Achieves up to twice the returns of baseline methods
Maintains soundness of formal specifications in RL
Abstract
Ensuring that reinforcement learning (RL) controllers satisfy safety and reliability constraints in real-world settings remains challenging: state-avoidance and constrained Markov decision processes often fail to capture trajectory-level requirements or induce overly conservative behavior. Formal specification languages such as linear temporal logic (LTL) offer correct-by-construction objectives, yet their rewards are typically sparse, and heuristic shaping can undermine correctness. We introduce, to our knowledge, the first end-to-end framework that integrates LTL with differentiable simulators, enabling efficient gradient-based learning directly from formal specifications. Our method relaxes discrete automaton transitions via soft labeling of states, yielding differentiable rewards and state representations that mitigate the sparsity issue intrinsic to LTL while preserving objective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
