On Convergence of Value Iteration for a Class of Total Cost Markov Decision Processes
Huizhen Yu

TL;DR
This paper investigates the convergence properties of value iteration in a broad class of total cost Markov decision processes with arbitrary costs, extending known results and introducing new convergence conditions.
Contribution
It extends convergence results of value iteration to the General Convergence (GC) total cost model, including cases with arbitrary costs and partial convergence scenarios.
Findings
Value iteration converges from above for certain functions.
Transfinite value iteration converges when the optimal cost is nonnegative.
Partial convergence of value iteration occurs for some initial states.
Abstract
We consider a general class of total cost Markov decision processes (MDP) in which the one-stage costs can have arbitrary signs, but the sum of the negative parts of the one-stage costs is finite for all policies and all initial states. We refer to this class as the General Convergence (GC for short) total cost model, and we study the convergence of value iteration for the GC model, in the Borel MDP framework with universally measurable policies. Our main results include: (i) convergence of value iteration when starting from certain functions above the optimal cost function; (ii) convergence of transfinite value iteration starting from zero, in the special case where the optimal cost function is nonnegative; and (iii) partial convergence of value iteration starting from zero, for a subset of initial states. These results extend several previously known results about the convergence of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
