Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel
Stanislav Fort, Gintare Karolina Dziugaite, Mansheej Paul, Sepideh, Kharaghani, Daniel M. Roy, Surya Ganguli

TL;DR
This study empirically analyzes the training dynamics of deep neural networks, revealing a rapid chaotic phase that shapes final performance and the evolution of the neural tangent kernel, offering insights into loss landscape geometry.
Contribution
It provides a large-scale phenomenological analysis linking loss landscape geometry and NTK evolution, uncovering a universal chaotic-to-stable transition in early training.
Findings
NTK changes rapidly during initial epochs, learning useful features.
Post-transient, NTK evolves at constant velocity, matching full network performance.
Early chaotic transient determines the final low-loss basin.
Abstract
In suitably initialized wide networks, small learning rates transform deep neural networks (DNNs) into neural tangent kernel (NTK) machines, whose training dynamics is well-approximated by a linear weight expansion of the network at initialization. Standard training, however, diverges from its linearization in ways that are poorly understood. We study the relationship between the training dynamics of nonlinear deep networks, the geometry of the loss landscape, and the time evolution of a data-dependent NTK. We do so through a large-scale phenomenological analysis of training, synthesizing diverse measures characterizing loss landscape geometry and NTK dynamics. In multiple neural architectures and datasets, we find these diverse measures evolve in a highly correlated manner, revealing a universal picture of the deep learning process. In this picture, deep network training exhibits a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Model Reduction and Neural Networks
MethodsNeural Tangent Kernel
