Revisiting Value Iteration: Unified Analysis of Discounted and Average-Reward Cases
Arsenii Mustafin, Xinyi Sheng, Dominik Baumann

TL;DR
This paper provides a unified geometric analysis demonstrating that value iteration converges faster than previously thought in both discounted and average-reward reinforcement learning scenarios, under certain assumptions.
Contribution
It introduces a unified geometric framework showing faster convergence of value iteration in both settings, challenging prior sublinear convergence results.
Findings
Convergence is geometric in both discounted and average-reward cases.
The convergence rate is faster than previous theoretical bounds.
Assumption of a unique and unichain optimal policy is key.
Abstract
While Value Iteration (VI) is one of the most fundamental algorithms in Reinforcement Learning, its theoretical convergence guarantees still exhibit a persistent mismatch with empirical behavior. In the discounted-reward case, classical theory guarantees geometric convergence with rate , while in the average-reward case recent work suggests that only sublinear convergence can be expected. In practice, however, VI is often observed to converge significantly faster. In this work, we show through a unified geometry-based analysis that, under an assumption of a unique and unichain optimal policy, (i) convergence is geometric in both the discounted- and average-reward settings and (ii) the convergence rate is faster than previous analyses suggest.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
