Fitted $Q$ Evaluation Without Bellman Completeness via Stationary Weighting
Lars van der Laan, Nathan Kallus

TL;DR
This paper introduces stationary-weighted FQE, a new off-policy evaluation method that avoids Bellman completeness assumptions by reweighting regression with stationary density ratios, leading to improved stability.
Contribution
It proposes a novel stationary-weighted FQE approach that aligns the regression norm with the contractive Bellman operator, removing the need for Bellman completeness.
Findings
Stationary weighting stabilizes FQE in practice.
Theoretical proof of linear convergence without Bellman completeness.
Error bounds show ratio-estimation error diminishes with small Bellman error.
Abstract
Fitted -evaluation (FQE) is a standard regression-based tool for off-policy evaluation, but existing stability guarantees often rely on Bellman completeness, a strong closure condition that can fail under function approximation. We study an alternative route: changing the norm used in the regression step. The policy-evaluation Bellman operator is contractive in the norm induced by the target policy's stationary state-action distribution, whereas standard off-policy FQE projects Bellman targets in the behavior-distribution norm. We propose stationary-weighted FQE, which reweights each Bellman regression by the stationary target-to-behavior density ratio. The method preserves FQE's modular supervised-learning form while aligning the fitted projection with that contractive norm. We prove finite-sample linear convergence to the stationary projected Bellman fixed point under…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
