Eigenvalue distribution of the Neural Tangent Kernel in the quadratic scaling
Lucas Benigni, Elliot Paquette

TL;DR
This paper derives the asymptotic eigenvalue distribution of the neural tangent kernel for a two-layer neural network under specific high-dimensional scaling, revealing it as a free multiplicative convolution involving the Marchenko--Pastur distribution.
Contribution
It provides a novel theoretical characterization of the eigenvalue distribution of the NTK in the quadratic scaling regime, extending understanding of neural network kernel spectra.
Findings
Eigenvalue distribution described as free multiplicative convolution.
Distribution depends on activation function and diagonal matrix D.
Results applicable under specific high-dimensional scaling limits.
Abstract
We compute the asymptotic eigenvalue distribution of the neural tangent kernel of a two-layer neural network under a specific scaling of dimension. Namely, if is an i.i.d random matrix, is an i.i.d matrix and is a diagonal matrix with i.i.d bounded entries, we consider the matrix \[ \mathrm{NTK} = \frac{1}{d}XX^\top \odot \frac{1}{p} \sigma'\left( \frac{1}{\sqrt{d}}XW \right)D^2 \sigma'\left( \frac{1}{\sqrt{d}}XW \right)^\top \] where is a pseudo-Lipschitz function applied entrywise and under the scaling and . We describe the asymptotic distribution as the free multiplicative convolution of the Marchenko--Pastur distribution with a deterministic distribution depending on and .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
