Proper Policies in Infinite-State Stochastic Shortest Path Problems

Dimitri P. Bertsekas

arXiv:1711.10129·math.OC·January 15, 2020·IEEE Trans. Autom. Control.

Proper Policies in Infinite-State Stochastic Shortest Path Problems

Dimitri P. Bertsekas

PDF

TL;DR

This paper extends the concept of proper policies to infinite-state stochastic shortest path problems, analyzing the properties of optimal cost functions and their solutions to Bellman's equation.

Contribution

It introduces the notion of proper policies in infinite state spaces and characterizes the optimal cost functions as solutions to Bellman's equation within a Lyapunov-like framework.

Findings

01

$J^*$ is the smallest solution of Bellman's equation.

02

$ ilde J$ is the largest solution among proper policies.

03

Value iteration may converge to either $J^*$ or $ ilde J$ depending on initial conditions.

Abstract

We consider stochastic shortest path problems with infinite state and control spaces, a nonnegative cost per stage, and a termination state. We extend the notion of a proper policy, a policy that terminates within a finite expected number of steps, from the context of finite state space to the context of infinite state space. We consider the optimal cost function $J^{*}$ , and the optimal cost function $\hat{J}$ over just the proper policies. We show that $J^{*}$ and $\hat{J}$ are the smallest and largest solutions of Bellman's equation, respectively, within a suitable class of Lyapounov-like functions. If the cost per stage is bounded, these functions are those that are bounded over the effective domain of $\hat{J}$ . The standard value iteration algorithm may be attracted to either $J^{*}$ or $\hat{J}$ , depending on the initial condition.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.