TD Convergence: An Optimization Perspective
Kavosh Asadi, Shoham Sabach, Yao Liu, Omer Gottesman, Rasool Fakoor

TL;DR
This paper analyzes the convergence of the TD learning algorithm from an optimization perspective, revealing key factors that influence its stability and extending understanding beyond linear cases.
Contribution
It introduces an optimization framework for TD convergence, identifying forces affecting stability, and broadens convergence results beyond traditional linear and quadratic settings.
Findings
Identifies two forces influencing TD convergence and divergence.
Proves convergence conditions in linear quadratic settings.
Extends convergence analysis to broader, non-linear scenarios.
Abstract
We study the convergence behavior of the celebrated temporal-difference (TD) learning algorithm. By looking at the algorithm through the lens of optimization, we first argue that TD can be viewed as an iterative optimization algorithm where the function to be minimized changes per iteration. By carefully investigating the divergence displayed by TD on a classical counter example, we identify two forces that determine the convergent or divergent behavior of the algorithm. We next formalize our discovery in the linear TD setting with quadratic loss and prove that convergence of TD hinges on the interplay between these two forces. We extend this optimization perspective to prove convergence of TD in a much broader setting than just linear approximation and squared loss. Our results provide a theoretical explanation for the successful application of TD in reinforcement learning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsEvolutionary Algorithms and Applications · Neural Networks and Reservoir Computing · Gene Regulatory Network Analysis
