On the Divergence of Differential Temporal Difference Learning without Local Clocks

David Antrobius; Shangtong Zhang

arXiv:2605.06874·cs.LG·May 11, 2026

On the Divergence of Differential Temporal Difference Learning without Local Clocks

David Antrobius, Shangtong Zhang

PDF

TL;DR

This paper demonstrates that in average-reward reinforcement learning, differential temporal difference learning can diverge when using a global clock, despite converging with a local clock, revealing a fundamental divergence.

Contribution

It provides the first counterexample showing divergence with a global clock in average-reward RL, resolving an open problem from prior research.

Findings

01

Differential TD learning converges with a local clock but can diverge with a global clock in average-reward RL.

02

The divergence counterexample addresses an open problem from Wan et al. (2021) and Blaser et al. (2026).

03

In discounted RL, convergence with local and global clocks are equivalent, unlike in average-reward RL.

Abstract

Learning rate is a critical component of reinforcement learning (RL). This work uses global and local clocks to distinguish two types of learning rates. The former is of the standard form $α_{t}$ that depends only on the time step $t$ (i.e., a global clock). The latter is of the form $α_{ν (S_{t}, t)}$ , where $ν (s, t)$ counts the number of visits to state $s$ until time $t$ (i.e., a local clock). In discounted RL, an RL algorithm that is convergent with a local clock is always also convergent with a global clock, and vice versa. We are not aware of any counterexample. The key contribution of this work is to show that this nice correspondence breaks down in average-reward RL. Specifically, we construct a counterexample showing that although differential temporal difference learning is convergent with a local clock, it can diverge with a global clock. This counterexample closes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.