Can Kernel Methods Explain How the Data Affects Neural Collapse?
Vignesh Kothapalli, Tom Tirer

TL;DR
This paper investigates how kernel methods, especially data-aware kernels, can explain the Neural Collapse phenomenon in neural networks, revealing limitations of data-independent kernels like NTK and highlighting the importance of data adaptivity.
Contribution
It introduces a kernel-based framework for analyzing Neural Collapse, compares NTK and NNGP kernels, and explores data-aware kernels, advancing understanding of data effects on neural network feature collapse.
Findings
NTK does not produce more collapsed features than NNGP for Gaussian data.
Data-aware kernels can better model feature learning and Neural Collapse.
Activation functions influence the degree of Neural Collapse, with ERF yielding lower NC1 than ReLU.
Abstract
A vast amount of literature has recently focused on the "Neural Collapse" (NC) phenomenon, which emerges when training neural network (NN) classifiers beyond the zero training error point. The core component of NC is the decrease in the within-class variability of the network's deepest features, dubbed as NC1. The theoretical works that study NC are typically based on simplified unconstrained features models (UFMs) that mask any effect of the data on the extent of collapse. To address this limitation of UFMs, this paper explores the possibility of analyzing NC1 using kernels associated with shallow NNs. We begin by formulating an NC1 metric as a function of the kernel. Then, we specialize it to the NN Gaussian Process kernel (NNGP) and the Neural Tangent Kernel (NTK), associated with wide networks at initialization and during gradient-based training with a small learning rate,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsFocus · Gaussian Process · Neural Tangent Kernel
