A Unifying View of Linear Function Approximation in Off-Policy RL Through Matrix Splitting and Preconditioning

Zechen Wu; Amy Greenwald; Ronald Parr

arXiv:2501.01774·cs.LG·December 2, 2025

A Unifying View of Linear Function Approximation in Off-Policy RL Through Matrix Splitting and Preconditioning

Zechen Wu, Amy Greenwald, Ronald Parr

PDF

Open Access 1 Video

TL;DR

This paper unifies various off-policy reinforcement learning algorithms under a single linear algebra framework using matrix splitting and preconditioning, providing new insights into their convergence properties.

Contribution

It introduces a novel mathematical perspective that unifies TD, FQI, and PFQI as a single iterative method with different matrix splitting schemes, improving theoretical understanding.

Findings

01

Unified view explains differences in convergence behavior

02

Identifies conditions for convergence without feature independence

03

Shows smaller learning rates can improve convergence in some cases

Abstract

In off-policy policy evaluation (OPE) tasks within reinforcement learning, Temporal Difference Learning(TD) and Fitted Q-Iteration (FQI) have traditionally been viewed as differing in the number of updates toward the target value function: TD makes one update, FQI makes an infinite number, and Partial Fitted Q-Iteration (PFQI) performs a finite number. We show that this view is not accurate, and provide a new mathematical perspective under linear value function approximation that unifies these methods as a single iterative method solving the same linear system, but using different matrix splitting schemes and preconditioners. We show that increasing the number of updates under the same target value function, i.e., the target network technique, is a transition from using a constant preconditioner to using a data-feature adaptive preconditioner. This elucidates, for the first time, why TD…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A Unifying View of Linear Function Approximation in Off-Policy RL Through Matrix Splitting and Preconditioning· slideslive

Taxonomy

TopicsMatrix Theory and Algorithms · Electromagnetic Scattering and Analysis · Numerical methods for differential equations

MethodsFocus