Deformed semicircle law and concentration of nonlinear random matrices   for ultra-wide neural networks

Zhichao Wang; Yizhe Zhu

arXiv:2109.09304·math.ST·April 17, 2023·1 cites

Deformed semicircle law and concentration of nonlinear random matrices for ultra-wide neural networks

Zhichao Wang, Yizhe Zhu

PDF

Open Access

TL;DR

This paper analyzes the spectral distribution of neural network kernels in ultra-wide regimes, revealing a deformed semicircle law and demonstrating concentration results that connect empirical and limiting kernels for improved understanding of neural network behavior.

Contribution

It introduces a deformed semicircle law for spectral distributions in ultra-wide neural networks and provides concentration bounds linking empirical and theoretical kernels.

Findings

01

Deformed semicircle law describes spectral distribution in ultra-wide neural networks.

02

Concentration bounds show empirical kernels closely follow their limiting counterparts.

03

Asymptotic analysis of training and test errors in random feature regression.

Abstract

In this paper, we investigate a two-layer fully connected neural network of the form $f (X) = \frac{1}{d _{1}} a^{⊤} σ (W X)$ , where $X \in R^{d_{0} \times n}$ is a deterministic data matrix, $W \in R^{d_{1} \times d_{0}}$ and $a \in R^{d_{1}}$ are random Gaussian weights, and $σ$ is a nonlinear activation function. We study the limiting spectral distributions of two empirical kernel matrices associated with $f (X)$ : the empirical conjugate kernel (CK) and neural tangent kernel (NTK), beyond the linear-width regime ( $d_{1} ≍ n$ ). We focus on the $ultra-wide regime$ , where the width $d_{1}$ of the first layer is much larger than the sample size $n$ . Under appropriate assumptions on $X$ and $σ$ , a deformed semicircle law emerges as $d_{1} / n \to \infty$ and $n \to \infty$ . We first prove this limiting law for generalized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Matrix Theory and Algorithms

MethodsTest · Neural Tangent Kernel