Beyond the Bellman Fixed Point: Geometry and Fast Policy Identification in Value Iteration

Donghwan Lee

arXiv:2604.17457·math.OC·May 6, 2026

Beyond the Bellman Fixed Point: Geometry and Fast Policy Identification in Value Iteration

Donghwan Lee

PDF

TL;DR

This paper analyzes the geometry of Q-value iteration, showing how it rapidly identifies optimal policies by entering an invariant set, with convergence rates linked to spectral properties.

Contribution

It introduces a geometric and spectral analysis of Q-VI, revealing finite-time policy identification and conditions for faster convergence than classical methods.

Findings

01

Q-VI reaches the optimal action class in finite time.

02

Distance to the invariant set decreases exponentially with a rate related to spectral radius.

03

Spectral and graph-theoretic conditions determine when convergence is faster than the classical rate.

Abstract

Q-value iteration (Q-VI) is usually analyzed through the \(\gamma\)-contraction of the Bellman operator. This argument proves convergence to \(Q^*\), but it gives only a coarse account of when the induced greedy policy becomes optimal. We study discounted Q-VI as a switching system and focus on the practically optimal solution set (POSS), the set of \(Q\)-functions whose tie-broken greedy policies are optimal. The main result shows that Q-VI reaches the optimal action class in finite time by entering an invariant tube around \(\mathcal X_1=Q^*+\operatorname{span}(\mathbf 1)\), which is contained in the POSS. For every \(\varepsilon>0\), the distance to \(\mathcal X_1\) satisfies an exponential bound with rate \((\bar\rho+\varepsilon)^k\), where \(\bar\rho\) is the joint spectral radius of the projected switching family restricted to directions transverse to \(\mathcal X_1\). When…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.