An Experimental Comparison Between Temporal Difference and Residual Gradient with Neural Network Approximation
Shuyu Yin, Tao Luo, Peilin Liu, Zhi-Qin John Xu

TL;DR
This paper compares Temporal Difference and Residual Gradient methods in deep Q-learning, showing TD generally outperforms RG in policy quality and robustness, and explores why incomplete gradient methods behave differently in reinforcement learning.
Contribution
The study provides extensive empirical evidence that TD outperforms RG in deep Q-learning and reveals the impact of missing terms in TD on performance.
Findings
TD achieves better policies and robustness than RG.
Small Bellman residual does not always indicate good policy in RL.
Missing terms in TD are a key reason for RG's poor performance.
Abstract
Gradient descent or its variants are popular in training neural networks. However, in deep Q-learning with neural network approximation, a type of reinforcement learning, gradient descent (also known as Residual Gradient (RG)) is barely used to solve Bellman residual minimization problem. On the contrary, Temporal Difference (TD), an incomplete gradient descent method prevails. In this work, we perform extensive experiments to show that TD outperforms RG, that is, when the training leads to a small Bellman residual error, the solution found by TD has a better policy and is more robust against the perturbation of neural network parameters. We further use experiments to reveal a key difference between reinforcement learning and supervised learning, that is, a small Bellman residual error can correspond to a bad policy in reinforcement learning while the test loss function in supervised…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and ELM · Stochastic Gradient Optimization Techniques
MethodsQ-Learning
