Dynamics of Deep Neural Networks and Neural Tangent Hierarchy
Jiaoyang Huang, Horng-Tzer Yau

TL;DR
This paper develops a hierarchy of differential equations to describe how the neural tangent kernel evolves during training of finite-width deep neural networks, explaining their superior performance over kernel methods.
Contribution
It introduces the neural tangent hierarchy (NTH) to model NTK dynamics and proves its approximation accuracy, advancing understanding of finite-width neural network training.
Findings
NTH captures NTK evolution during training.
Finite-width effects cause NTK changes impacting performance.
Deep networks outperform kernel regressions due to NTK dynamics.
Abstract
The evolution of a deep neural network trained by the gradient descent can be described by its neural tangent kernel (NTK) as introduced in [20], where it was proven that in the infinite width limit the NTK converges to an explicit limiting kernel and it stays constant during training. The NTK was also implicit in some other recent papers [6,13,14]. In the overparametrization regime, a fully-trained deep neural network is indeed equivalent to the kernel regression predictor using the limiting NTK. And the gradient descent achieves zero training loss for a deep overparameterized neural network. However, it was observed in [5] that there is a performance gap between the kernel regression using the limiting NTK and the deep neural networks. This performance gap is likely to originate from the change of the NTK along training due to the finite width effect. The change of the NTK along the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsModel Reduction and Neural Networks · Neural Networks and Applications · Stochastic Gradient Optimization Techniques
MethodsNeural Tangent Kernel
