Revisiting Value Iteration: Unified Analysis of Discounted and Average-Reward Cases

Arsenii Mustafin; Xinyi Sheng; Dominik Baumann

arXiv:2510.23914·cs.LG·March 12, 2026

Revisiting Value Iteration: Unified Analysis of Discounted and Average-Reward Cases

Arsenii Mustafin, Xinyi Sheng, Dominik Baumann

PDF

TL;DR

This paper provides a unified geometric analysis demonstrating that value iteration converges faster than previously thought in both discounted and average-reward reinforcement learning scenarios, under certain assumptions.

Contribution

It introduces a unified geometric framework showing faster convergence of value iteration in both settings, challenging prior sublinear convergence results.

Findings

01

Convergence is geometric in both discounted and average-reward cases.

02

The convergence rate is faster than previous theoretical bounds.

03

Assumption of a unique and unichain optimal policy is key.

Abstract

While Value Iteration (VI) is one of the most fundamental algorithms in Reinforcement Learning, its theoretical convergence guarantees still exhibit a persistent mismatch with empirical behavior. In the discounted-reward case, classical theory guarantees geometric convergence with rate $γ$ , while in the average-reward case recent work suggests that only sublinear convergence can be expected. In practice, however, VI is often observed to converge significantly faster. In this work, we show through a unified geometry-based analysis that, under an assumption of a unique and unichain optimal policy, (i) convergence is geometric in both the discounted- and average-reward settings and (ii) the convergence rate is faster than previous analyses suggest.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.