Properties of Turnpike Functions for Discounted Finite MDPs
Eugene A. Feinberg, Gaojin He

TL;DR
This paper investigates the properties of turnpike functions in discounted finite MDPs, providing bounds on the number of iterations needed for value iteration to produce optimal policies, thus supporting the rolling horizon approach.
Contribution
It characterizes properties of turnpike integers in discounted finite MDPs and establishes upper bounds, enhancing understanding of value iteration convergence.
Findings
Turnpike integers are finite and well-defined.
Upper bounds for turnpike integers are derived.
Results support the effectiveness of the rolling horizon approach.
Abstract
This paper studies discounted Markov Decision Processes (MDPs) with finite sets of states and actions. Value iteration is one of the major methods for finding optimal policies. For each discount factor, starting from a finite number of iterations, which is called the turnpike integer, value iteration algorithms always generate decision rules, which are deterministic optimal policies for the infinite-horizon problems. This fact justifies the rolling horizon approach for computing infinite-horizon optimal policies by conducting a finite number of value iterations. This paper describes properties of turnpike integers and provides their upper bounds.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Numerical Methods in Computational Mathematics
