Neural Tangent Kernel Beyond the Infinite-Width Limit: Effects of Depth and Initialization
Mariia Seleznova, Gitta Kutyniok

TL;DR
This paper investigates how the Neural Tangent Kernel (NTK) behaves in deep neural networks with finite width, revealing that its properties depend on the depth-to-width ratio and initial parameter distribution, especially across different dynamical phases.
Contribution
It provides exact expressions for NTK dispersion in the finite-width regime across different phases, extending understanding beyond the infinite-width limit.
Findings
NTK variability grows exponentially with depth at the EOC and in the chaotic phase.
NTK remains constant during training only in the ordered phase.
The depth-to-width ratio and initialization critically influence NTK behavior.
Abstract
Neural Tangent Kernel (NTK) is widely used to analyze overparametrized neural networks due to the famous result by Jacot et al. (2018): in the infinite-width limit, the NTK is deterministic and constant during training. However, this result cannot explain the behavior of deep networks, since it generally does not hold if depth and width tend to infinity simultaneously. In this paper, we study the NTK of fully-connected ReLU networks with depth comparable to width. We prove that the NTK properties depend significantly on the depth-to-width ratio and the distribution of parameters at initialization. In fact, our results indicate the importance of the three phases in the hyperparameter space identified in Poole et al. (2016): ordered, chaotic and the edge of chaos (EOC). We derive exact expressions for the NTK dispersion in the infinite-depth-and-width limit in all three phases and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Computational Physics and Python Applications
MethodsNeural Tangent Kernel
