On the Divergence of Differential Temporal Difference Learning without Local Clocks
David Antrobius, Shangtong Zhang

TL;DR
This paper demonstrates that in average-reward reinforcement learning, differential temporal difference learning can diverge when using a global clock, despite converging with a local clock, revealing a fundamental divergence.
Contribution
It provides the first counterexample showing divergence with a global clock in average-reward RL, resolving an open problem from prior research.
Findings
Differential TD learning converges with a local clock but can diverge with a global clock in average-reward RL.
The divergence counterexample addresses an open problem from Wan et al. (2021) and Blaser et al. (2026).
In discounted RL, convergence with local and global clocks are equivalent, unlike in average-reward RL.
Abstract
Learning rate is a critical component of reinforcement learning (RL). This work uses global and local clocks to distinguish two types of learning rates. The former is of the standard form that depends only on the time step (i.e., a global clock). The latter is of the form , where counts the number of visits to state until time (i.e., a local clock). In discounted RL, an RL algorithm that is convergent with a local clock is always also convergent with a global clock, and vice versa. We are not aware of any counterexample. The key contribution of this work is to show that this nice correspondence breaks down in average-reward RL. Specifically, we construct a counterexample showing that although differential temporal difference learning is convergent with a local clock, it can diverge with a global clock. This counterexample closes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
