Deep learning versus kernel learning: an empirical study of loss   landscape geometry and the time evolution of the Neural Tangent Kernel

Stanislav Fort; Gintare Karolina Dziugaite; Mansheej Paul; Sepideh; Kharaghani; Daniel M. Roy; Surya Ganguli

arXiv:2010.15110·cs.LG·October 29, 2020·22 cites

Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel

Stanislav Fort, Gintare Karolina Dziugaite, Mansheej Paul, Sepideh, Kharaghani, Daniel M. Roy, Surya Ganguli

PDF

Open Access 1 Video

TL;DR

This study empirically analyzes the training dynamics of deep neural networks, revealing a rapid chaotic phase that shapes final performance and the evolution of the neural tangent kernel, offering insights into loss landscape geometry.

Contribution

It provides a large-scale phenomenological analysis linking loss landscape geometry and NTK evolution, uncovering a universal chaotic-to-stable transition in early training.

Findings

01

NTK changes rapidly during initial epochs, learning useful features.

02

Post-transient, NTK evolves at constant velocity, matching full network performance.

03

Early chaotic transient determines the final low-loss basin.

Abstract

In suitably initialized wide networks, small learning rates transform deep neural networks (DNNs) into neural tangent kernel (NTK) machines, whose training dynamics is well-approximated by a linear weight expansion of the network at initialization. Standard training, however, diverges from its linearization in ways that are poorly understood. We study the relationship between the training dynamics of nonlinear deep networks, the geometry of the loss landscape, and the time evolution of a data-dependent NTK. We do so through a large-scale phenomenological analysis of training, synthesizing diverse measures characterizing loss landscape geometry and NTK dynamics. In multiple neural architectures and datasets, we find these diverse measures evolve in a highly correlated manner, revealing a universal picture of the deep learning process. In this picture, deep network training exhibits a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Model Reduction and Neural Networks

MethodsNeural Tangent Kernel