Stabilizing Temporal Difference Learning via Implicit Stochastic Recursion

Hwanwoo Kim; Panos Toulis; Eric Laber

arXiv:2505.01361·cs.LG·June 24, 2025

Stabilizing Temporal Difference Learning via Implicit Stochastic Recursion

Hwanwoo Kim, Panos Toulis, Eric Laber

PDF

TL;DR

This paper introduces implicit TD algorithms that reformulate traditional TD updates into fixed point equations, significantly enhancing stability and robustness to step size choices in reinforcement learning.

Contribution

The paper proposes a novel implicit formulation of TD algorithms, providing theoretical convergence guarantees and demonstrating improved stability over traditional methods.

Findings

01

Implicit TD algorithms are less sensitive to step size.

02

They achieve stable convergence with a broader range of step sizes.

03

Empirical results show improved performance in RL tasks.

Abstract

Temporal difference (TD) learning is a foundational algorithm in reinforcement learning (RL). For nearly forty years, TD learning has served as a workhorse for applied RL as well as a building block for more complex and specialized algorithms. However, despite its widespread use, TD procedures are generally sensitive to step size specification. A poor choice of step size can dramatically increase variance and slow convergence in both on-policy and off-policy evaluation tasks. In practice, researchers use trial and error to identify stable step sizes, but these approaches tend to be ad hoc and inefficient. As an alternative, we propose implicit TD algorithms that reformulate TD updates into fixed point equations. Such updates are more stable and less sensitive to step size without sacrificing computational efficiency. Moreover, we derive asymptotic convergence guarantees and finite-time…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsHigh-Order Consensuses