Stationary Reweighting Yields Local Convergence of Soft Fitted Q-Iteration

Lars van der Laan; Nathan Kallus

arXiv:2512.23927·stat.ML·May 11, 2026

Stationary Reweighting Yields Local Convergence of Soft Fitted Q-Iteration

Lars van der Laan, Nathan Kallus

PDF

TL;DR

This paper introduces stationary-reweighted soft FQI, a method that achieves local convergence in offline reinforcement learning without relying on Bellman completeness, by leveraging stationary norm alignment.

Contribution

It identifies a stability mechanism called local stationary norm alignment and develops a reweighting technique to ensure convergence without Bellman completeness.

Findings

01

Finite-sample local linear convergence under realizability.

02

Ordinary soft FQI is stable on-policy without Bellman completeness.

03

Temperature annealing acts as a continuation method to reach contraction regions.

Abstract

Fitted $Q$ -iteration (FQI) and soft FQI are widely used value-based methods for offline reinforcement learning, but their standard stability guarantees often depend on Bellman completeness, a strong closure condition that can fail under function approximation. We analyze soft FQI without Bellman completeness and identify the stability mechanism that replaces it: local stationary norm alignment. Near the soft-optimal fixed point, the soft Bellman operator has the same first-order behavior as the policy-evaluation operator for the soft-optimal policy. This operator contracts in the policy's stationary state-action norm, whereas standard fitted regression projects Bellman targets in the behavior norm. This mismatch explains instability under distribution shift. We use this insight to develop stationary-reweighted soft FQI, which reweights each regression step toward the stationary…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.