Neural Networks as Kernel Learners: The Silent Alignment Effect

Alexander Atanasov; Blake Bordelon; Cengiz Pehlevan

arXiv:2111.00034·stat.ML·February 7, 2022·1 cites

Neural Networks as Kernel Learners: The Silent Alignment Effect

Alexander Atanasov, Blake Bordelon, Cengiz Pehlevan

PDF

Open Access 1 Video

TL;DR

This paper introduces the silent alignment phenomenon in neural networks, showing how they can evolve their tangent kernel during training to perform kernel-like learning with data-dependent kernels, especially in the rich feature regime.

Contribution

It uncovers the silent alignment effect, demonstrating how neural networks develop data-dependent kernels during training, with analytical insights for linear networks and empirical evidence for deep networks.

Findings

01

Silent alignment occurs in homogenous networks with small initialization.

02

The tangent kernel develops a low-rank structure early in training.

03

Non-whitened data can weaken the silent alignment effect.

Abstract

Neural networks in the lazy training regime converge to kernel machines. Can neural networks in the rich feature learning regime learn a kernel machine with a data-dependent kernel? We demonstrate that this can indeed happen due to a phenomenon we term silent alignment, which requires that the tangent kernel of a network evolves in eigenstructure while small and before the loss appreciably decreases, and grows only in overall scale afterwards. We show that such an effect takes place in homogenous neural networks with small initialization and whitened data. We provide an analytical treatment of this effect in the linear network case. In general, we find that the kernel develops a low-rank contribution in the early phase of training, and then evolves in overall scale, yielding a function equivalent to a kernel regression solution with the final network's tangent kernel. The early spectral…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Neural Networks as Kernel Learners: The Silent Alignment Effect· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Gaussian Processes and Bayesian Inference