Finite Depth and Width Corrections to the Neural Tangent Kernel

Boris Hanin; Mihai Nica

arXiv:1909.05989·cs.LG·September 16, 2019·21 cites

Finite Depth and Width Corrections to the Neural Tangent Kernel

Boris Hanin, Mihai Nica

PDF

Open Access

TL;DR

This paper analyzes how finite depth and width affect the neural tangent kernel in ReLU networks, revealing that the NTK's variability and evolution depend exponentially on the depth-to-width ratio, impacting the network's learning capabilities.

Contribution

It provides a precise scaling analysis of the NTK at finite depth and width, showing the non-deterministic nature and data-dependent feature learning in deep, wide networks.

Findings

01

NTK variance is exponential in depth-to-width ratio

02

NTK evolves during training in deep, wide networks

03

Deep, wide networks can learn features even in lazy regimes

Abstract

We prove the precise scaling, at finite depth and width, for the mean and variance of the neural tangent kernel (NTK) in a randomly initialized ReLU network. The standard deviation is exponential in the ratio of network depth to width. Thus, even in the limit of infinite overparameterization, the NTK is not deterministic if depth and width simultaneously tend to infinity. Moreover, we prove that for such deep and wide networks, the NTK has a non-trivial evolution during training by showing that the mean of its first SGD update is also exponential in the ratio of network depth to width. This is sharp contrast to the regime where depth is fixed and network width is very large. Our results suggest that, unlike relatively shallow and wide networks, deep and wide ReLU networks are capable of learning data-dependent features even in the so-called lazy training regime.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Machine Learning and ELM

MethodsNeural Tangent Kernel · *Communicated@Fast*How Do I Communicate to Expedia? · Stochastic Gradient Descent