A Unifying View of Linear Function Approximation in Off-Policy RL Through Matrix Splitting and Preconditioning
Zechen Wu, Amy Greenwald, Ronald Parr

TL;DR
This paper unifies various off-policy reinforcement learning algorithms under a single linear algebra framework using matrix splitting and preconditioning, providing new insights into their convergence properties.
Contribution
It introduces a novel mathematical perspective that unifies TD, FQI, and PFQI as a single iterative method with different matrix splitting schemes, improving theoretical understanding.
Findings
Unified view explains differences in convergence behavior
Identifies conditions for convergence without feature independence
Shows smaller learning rates can improve convergence in some cases
Abstract
In off-policy policy evaluation (OPE) tasks within reinforcement learning, Temporal Difference Learning(TD) and Fitted Q-Iteration (FQI) have traditionally been viewed as differing in the number of updates toward the target value function: TD makes one update, FQI makes an infinite number, and Partial Fitted Q-Iteration (PFQI) performs a finite number. We show that this view is not accurate, and provide a new mathematical perspective under linear value function approximation that unifies these methods as a single iterative method solving the same linear system, but using different matrix splitting schemes and preconditioners. We show that increasing the number of updates under the same target value function, i.e., the target network technique, is a transition from using a constant preconditioner to using a data-feature adaptive preconditioner. This elucidates, for the first time, why TD…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMatrix Theory and Algorithms · Electromagnetic Scattering and Analysis · Numerical methods for differential equations
MethodsFocus
