Stationary Reweighting Yields Local Convergence of Soft Fitted Q-Iteration
Lars van der Laan, Nathan Kallus

TL;DR
This paper introduces stationary-reweighted soft FQI, a method that achieves local convergence in offline reinforcement learning without relying on Bellman completeness, by leveraging stationary norm alignment.
Contribution
It identifies a stability mechanism called local stationary norm alignment and develops a reweighting technique to ensure convergence without Bellman completeness.
Findings
Finite-sample local linear convergence under realizability.
Ordinary soft FQI is stable on-policy without Bellman completeness.
Temperature annealing acts as a continuation method to reach contraction regions.
Abstract
Fitted -iteration (FQI) and soft FQI are widely used value-based methods for offline reinforcement learning, but their standard stability guarantees often depend on Bellman completeness, a strong closure condition that can fail under function approximation. We analyze soft FQI without Bellman completeness and identify the stability mechanism that replaces it: local stationary norm alignment. Near the soft-optimal fixed point, the soft Bellman operator has the same first-order behavior as the policy-evaluation operator for the soft-optimal policy. This operator contracts in the policy's stationary state-action norm, whereas standard fitted regression projects Bellman targets in the behavior norm. This mismatch explains instability under distribution shift. We use this insight to develop stationary-reweighted soft FQI, which reweights each regression step toward the stationary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
