Deformed semicircle law and concentration of nonlinear random matrices for ultra-wide neural networks
Zhichao Wang, Yizhe Zhu

TL;DR
This paper analyzes the spectral distribution of neural network kernels in ultra-wide regimes, revealing a deformed semicircle law and demonstrating concentration results that connect empirical and limiting kernels for improved understanding of neural network behavior.
Contribution
It introduces a deformed semicircle law for spectral distributions in ultra-wide neural networks and provides concentration bounds linking empirical and theoretical kernels.
Findings
Deformed semicircle law describes spectral distribution in ultra-wide neural networks.
Concentration bounds show empirical kernels closely follow their limiting counterparts.
Asymptotic analysis of training and test errors in random feature regression.
Abstract
In this paper, we investigate a two-layer fully connected neural network of the form , where is a deterministic data matrix, and are random Gaussian weights, and is a nonlinear activation function. We study the limiting spectral distributions of two empirical kernel matrices associated with : the empirical conjugate kernel (CK) and neural tangent kernel (NTK), beyond the linear-width regime (). We focus on the , where the width of the first layer is much larger than the sample size . Under appropriate assumptions on and , a deformed semicircle law emerges as and . We first prove this limiting law for generalized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Matrix Theory and Algorithms
MethodsTest · Neural Tangent Kernel
