Deep Reinforcement Learning and The Tale of Two Temporal Difference Errors

Juan Sebastian Rojas; Chi-Guhn Lee

arXiv:2603.21921·cs.LG·March 24, 2026

Deep Reinforcement Learning and The Tale of Two Temporal Difference Errors

Juan Sebastian Rojas, Chi-Guhn Lee

PDF

Open Access

TL;DR

This paper investigates two different interpretations of the temporal difference (TD) error in deep reinforcement learning, revealing that they can diverge significantly in nonlinear architectures and impact algorithm performance.

Contribution

It demonstrates that the common assumption of equivalence between TD error interpretations does not always hold in deep RL, especially with nonlinear models, affecting algorithm outcomes.

Findings

01

Different TD error interpretations diverge in nonlinear deep RL models

02

Choosing the interpretation impacts the performance of RL algorithms

03

Default bootstrapped target interpretation may not always be valid in deep RL

Abstract

The temporal difference (TD) error was first formalized in Sutton (1988), where it was first characterized as the difference between temporally successive predictions, and later, in that same work, formulated as the difference between a bootstrapped target and a prediction. Since then, these two interpretations of the TD error have been used interchangeably in the literature, with the latter eventually being adopted as the standard critic loss in deep reinforcement learning (RL) architectures. In this work, we show that these two interpretations of the TD error are not always equivalent. In particular, we show that increasingly-nonlinear deep RL architectures can cause these interpretations of the TD error to yield increasingly different numerical values. Then, building on this insight, we show how choosing one interpretation of the TD error over the other can affect the performance of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Adversarial Robustness in Machine Learning