Large-width asymptotics for ReLU neural networks with $\alpha$-Stable initializations
Stefano Favaro, Sandra Fortini, Stefano Peluchetti

TL;DR
This paper extends large-width asymptotic analysis of neural networks to $ ext{alpha}$-Stable initializations, revealing how non-Gaussian stable distributions influence the limiting processes and training dynamics.
Contribution
It introduces the analysis of $ ext{alpha}$-Stable neural networks, showing their convergence to $ ext{alpha}$-Stable processes and characterizing their training dynamics with a new $ ext{alpha}$-Stable NTK.
Findings
Rescaled $ ext{alpha}$-Stable NNs converge to $ ext{alpha}$-Stable processes as width increases.
Activation functions influence the scaling law for $ ext{alpha}$-Stable processes.
Gradient descent achieves zero training error with a linear rate in large-width $ ext{alpha}$-Stable NNs.
Abstract
There is a recent and growing literature on large-width asymptotic properties of Gaussian neural networks (NNs), namely NNs whose weights are initialized as Gaussian distributions. Two popular problems are: i) the study of the large-width distributions of NNs, which characterizes the infinitely wide limit of a rescaled NN in terms of a Gaussian stochastic process; ii) the study of the large-width training dynamics of NNs, which characterizes the infinitely wide dynamics in terms of a deterministic kernel, referred to as the neural tangent kernel (NTK), and shows that, for a sufficiently large width, the gradient descent achieves zero training error at a linear rate. In this paper, we consider these problems for -Stable NNs, namely NNs whose weights are initialized as -Stable distributions with . First, for -Stable NNs with a ReLU activation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Stochastic Gradient Optimization Techniques · Machine Learning and ELM
MethodsNeural Tangent Kernel
