On Value Iteration Convergence in Connected MDPs

Arsenii Mustafin; Alex Olshevsky; Ioannis Ch. Paschalidis

arXiv:2406.09592·cs.LG·June 17, 2024

On Value Iteration Convergence in Connected MDPs

Arsenii Mustafin, Alex Olshevsky, Ioannis Ch. Paschalidis

PDF

Open Access

TL;DR

This paper proves that in connected Markov Decision Processes with a unique optimal policy, value iteration algorithms converge geometrically faster than the discount factor for both discounted and average-reward criteria.

Contribution

It establishes convergence guarantees for value iteration in connected MDPs with unique optimal policies, exceeding previous results.

Findings

01

Value iteration converges geometrically in connected MDPs.

02

Convergence rate exceeds the discount factor {b3}.

03

Results apply to both discounted and average-reward criteria.

Abstract

This paper establishes that an MDP with a unique optimal policy and ergodic associated transition matrix ensures the convergence of various versions of the Value Iteration algorithm at a geometric rate that exceeds the discount factor {\gamma} for both discounted and average-reward criteria.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Control Systems Optimization · Optimization and Variational Analysis