An Improved Finite-time Analysis of Temporal Difference Learning with Deep Neural Networks
Zhifa Ke, Zaiwen Wen, Junyu Zhang

TL;DR
This paper provides a new finite-time analysis of neural TD learning, achieving an improved sample complexity of under Markovian sampling, advancing theoretical understanding of neural network-based reinforcement learning.
Contribution
It introduces novel proof techniques and derives the first finite-time analysis with complexity for neural TD under Markovian sampling.
Findings
Achieves sample complexity for neural TD.
Develops new proof techniques for non-asymptotic analysis.
First finite-time analysis of neural TD with Markovian sampling.
Abstract
Temporal difference (TD) learning algorithms with neural network function parameterization have well-established empirical success in many practical large-scale reinforcement learning tasks. However, theoretical understanding of these algorithms remains challenging due to the nonlinearity of the action-value approximation. In this paper, we develop an improved non-asymptotic analysis of the neural TD method with a general -layer neural network. New proof techniques are developed and an improved new sample complexity is derived. To our best knowledge, this is the first finite-time analysis of neural TD that achieves an complexity under the Markovian sampling, as opposed to the best known complexity in the existing literature.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing
