An Experimental Comparison Between Temporal Difference and Residual   Gradient with Neural Network Approximation

Shuyu Yin; Tao Luo; Peilin Liu; Zhi-Qin John Xu

arXiv:2205.12770·cs.LG·November 15, 2022

An Experimental Comparison Between Temporal Difference and Residual Gradient with Neural Network Approximation

Shuyu Yin, Tao Luo, Peilin Liu, Zhi-Qin John Xu

PDF

Open Access

TL;DR

This paper compares Temporal Difference and Residual Gradient methods in deep Q-learning, showing TD generally outperforms RG in policy quality and robustness, and explores why incomplete gradient methods behave differently in reinforcement learning.

Contribution

The study provides extensive empirical evidence that TD outperforms RG in deep Q-learning and reveals the impact of missing terms in TD on performance.

Findings

01

TD achieves better policies and robustness than RG.

02

Small Bellman residual does not always indicate good policy in RL.

03

Missing terms in TD are a key reason for RG's poor performance.

Abstract

Gradient descent or its variants are popular in training neural networks. However, in deep Q-learning with neural network approximation, a type of reinforcement learning, gradient descent (also known as Residual Gradient (RG)) is barely used to solve Bellman residual minimization problem. On the contrary, Temporal Difference (TD), an incomplete gradient descent method prevails. In this work, we perform extensive experiments to show that TD outperforms RG, that is, when the training leads to a small Bellman residual error, the solution found by TD has a better policy and is more robust against the perturbation of neural network parameters. We further use experiments to reveal a key difference between reinforcement learning and supervised learning, that is, a small Bellman residual error can correspond to a bad policy in reinforcement learning while the test loss function in supervised…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and ELM · Stochastic Gradient Optimization Techniques

MethodsQ-Learning