Sample Complexity and Overparameterization Bounds for Temporal   Difference Learning with Neural Network Approximation

Semih Cayci; Siddhartha Satpathi; Niao He; R. Srikant

arXiv:2103.01391·cs.LG·August 9, 2021

Sample Complexity and Overparameterization Bounds for Temporal Difference Learning with Neural Network Approximation

Semih Cayci, Siddhartha Satpathi, Niao He, R. Srikant

PDF

TL;DR

This paper establishes new convergence bounds for neural network-based temporal difference learning, demonstrating that max-norm regularization significantly enhances sample efficiency and reduces overparameterization needs.

Contribution

It provides the first convergence bounds for projection-free and max-norm regularized Neural TD learning, highlighting the benefits of max-norm regularization.

Findings

01

Max-norm regularization improves sample complexity.

02

Max-norm regularization reduces overparameterization.

03

Novel Lyapunov drift analysis technique introduced.

Abstract

In this paper, we study the dynamics of temporal difference learning with neural network-based value function approximation over a general state space, namely, \emph{Neural TD learning}. We consider two practically used algorithms, projection-free and max-norm regularized Neural TD learning, and establish the first convergence bounds for these algorithms. An interesting observation from our results is that max-norm regularization can dramatically improve the performance of TD learning algorithms, both in terms of sample complexity and overparameterization. In particular, we prove that max-norm regularization improves state-of-the-art sample complexity and overparameterization bounds. The results in this work rely on a novel Lyapunov drift analysis of the network parameters as a stopped and controlled random process.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Methods*Communicated@Fast*How Do I Communicate to Expedia?