On Value Iteration Convergence in Connected MDPs
Arsenii Mustafin, Alex Olshevsky, Ioannis Ch. Paschalidis

TL;DR
This paper proves that in connected Markov Decision Processes with a unique optimal policy, value iteration algorithms converge geometrically faster than the discount factor for both discounted and average-reward criteria.
Contribution
It establishes convergence guarantees for value iteration in connected MDPs with unique optimal policies, exceeding previous results.
Findings
Value iteration converges geometrically in connected MDPs.
Convergence rate exceeds the discount factor {b3}.
Results apply to both discounted and average-reward criteria.
Abstract
This paper establishes that an MDP with a unique optimal policy and ergodic associated transition matrix ensures the convergence of various versions of the Value Iteration algorithm at a geometric rate that exceeds the discount factor {\gamma} for both discounted and average-reward criteria.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Control Systems Optimization · Optimization and Variational Analysis
