Stabilizing Temporal Difference Learning via Implicit Stochastic Recursion
Hwanwoo Kim, Panos Toulis, Eric Laber

TL;DR
This paper introduces implicit TD algorithms that reformulate traditional TD updates into fixed point equations, significantly enhancing stability and robustness to step size choices in reinforcement learning.
Contribution
The paper proposes a novel implicit formulation of TD algorithms, providing theoretical convergence guarantees and demonstrating improved stability over traditional methods.
Findings
Implicit TD algorithms are less sensitive to step size.
They achieve stable convergence with a broader range of step sizes.
Empirical results show improved performance in RL tasks.
Abstract
Temporal difference (TD) learning is a foundational algorithm in reinforcement learning (RL). For nearly forty years, TD learning has served as a workhorse for applied RL as well as a building block for more complex and specialized algorithms. However, despite its widespread use, TD procedures are generally sensitive to step size specification. A poor choice of step size can dramatically increase variance and slow convergence in both on-policy and off-policy evaluation tasks. In practice, researchers use trial and error to identify stable step sizes, but these approaches tend to be ad hoc and inefficient. As an alternative, we propose implicit TD algorithms that reformulate TD updates into fixed point equations. Such updates are more stable and less sensitive to step size without sacrificing computational efficiency. Moreover, we derive asymptotic convergence guarantees and finite-time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsHigh-Order Consensuses
