Convergence Guarantee of Dynamic Programming for LTL Surrogate Reward

Zetong Xuan; Yu Wang

arXiv:2408.05438·cs.RO·August 13, 2024

Convergence Guarantee of Dynamic Programming for LTL Surrogate Reward

Zetong Xuan, Yu Wang

PDF

Open Access

TL;DR

This paper proves that dynamic programming with surrogate rewards for LTL objectives in MDPs always converges exponentially, ensuring reliable satisfaction probability computation even with certain discount factors.

Contribution

It establishes a multi-step contraction property that guarantees exponential convergence of dynamic programming for LTL surrogate rewards.

Findings

01

Multi-step contraction guarantees convergence.

02

Exponential convergence of value function.

03

Reliable satisfaction probability computation.

Abstract

Linear Temporal Logic (LTL) is a formal way of specifying complex objectives for planning problems modeled as Markov Decision Processes (MDPs). The planning problem aims to find the optimal policy that maximizes the satisfaction probability of the LTL objective. One way to solve the planning problem is to use the surrogate reward with two discount factors and dynamic programming, which bypasses the graph analysis used in traditional model-checking. The surrogate reward is designed such that its value function represents the satisfaction probability. However, in some cases where one of the discount factors is set to $1$ for higher accuracy, the computation of the value function using dynamic programming is not guaranteed. This work shows that a multi-step contraction always exists during dynamic programming updates, guaranteeing that the approximate value function will converge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsElevator Systems and Control