Convergence Guarantee of Dynamic Programming for LTL Surrogate Reward
Zetong Xuan, Yu Wang

TL;DR
This paper proves that dynamic programming with surrogate rewards for LTL objectives in MDPs always converges exponentially, ensuring reliable satisfaction probability computation even with certain discount factors.
Contribution
It establishes a multi-step contraction property that guarantees exponential convergence of dynamic programming for LTL surrogate rewards.
Findings
Multi-step contraction guarantees convergence.
Exponential convergence of value function.
Reliable satisfaction probability computation.
Abstract
Linear Temporal Logic (LTL) is a formal way of specifying complex objectives for planning problems modeled as Markov Decision Processes (MDPs). The planning problem aims to find the optimal policy that maximizes the satisfaction probability of the LTL objective. One way to solve the planning problem is to use the surrogate reward with two discount factors and dynamic programming, which bypasses the graph analysis used in traditional model-checking. The surrogate reward is designed such that its value function represents the satisfaction probability. However, in some cases where one of the discount factors is set to for higher accuracy, the computation of the value function using dynamic programming is not guaranteed. This work shows that a multi-step contraction always exists during dynamic programming updates, guaranteeing that the approximate value function will converge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsElevator Systems and Control
