An Improved Finite-time Analysis of Temporal Difference Learning with   Deep Neural Networks

Zhifa Ke; Zaiwen Wen; Junyu Zhang

arXiv:2405.04017·cs.LG·May 8, 2024

An Improved Finite-time Analysis of Temporal Difference Learning with Deep Neural Networks

Zhifa Ke, Zaiwen Wen, Junyu Zhang

PDF

Open Access

TL;DR

This paper provides a new finite-time analysis of neural TD learning, achieving an improved sample complexity of under Markovian sampling, advancing theoretical understanding of neural network-based reinforcement learning.

Contribution

It introduces novel proof techniques and derives the first finite-time analysis with complexity for neural TD under Markovian sampling.

Findings

01

Achieves sample complexity for neural TD.

02

Develops new proof techniques for non-asymptotic analysis.

03

First finite-time analysis of neural TD with Markovian sampling.

Abstract

Temporal difference (TD) learning algorithms with neural network function parameterization have well-established empirical success in many practical large-scale reinforcement learning tasks. However, theoretical understanding of these algorithms remains challenging due to the nonlinearity of the action-value approximation. In this paper, we develop an improved non-asymptotic analysis of the neural TD method with a general $L$ -layer neural network. New proof techniques are developed and an improved new $\tilde{O} (ϵ^{- 1})$ sample complexity is derived. To our best knowledge, this is the first finite-time analysis of neural TD that achieves an $\tilde{O} (ϵ^{- 1})$ complexity under the Markovian sampling, as opposed to the best known $\tilde{O} (ϵ^{- 2})$ complexity in the existing literature.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing