Large-width asymptotics for ReLU neural networks with $\alpha$-Stable   initializations

Stefano Favaro; Sandra Fortini; Stefano Peluchetti

arXiv:2206.08065·cs.LG·January 5, 2023

Large-width asymptotics for ReLU neural networks with $\alpha$-Stable initializations

Stefano Favaro, Sandra Fortini, Stefano Peluchetti

PDF

Open Access

TL;DR

This paper extends large-width asymptotic analysis of neural networks to $ ext{alpha}$-Stable initializations, revealing how non-Gaussian stable distributions influence the limiting processes and training dynamics.

Contribution

It introduces the analysis of $ ext{alpha}$-Stable neural networks, showing their convergence to $ ext{alpha}$-Stable processes and characterizing their training dynamics with a new $ ext{alpha}$-Stable NTK.

Findings

01

Rescaled $ ext{alpha}$-Stable NNs converge to $ ext{alpha}$-Stable processes as width increases.

02

Activation functions influence the scaling law for $ ext{alpha}$-Stable processes.

03

Gradient descent achieves zero training error with a linear rate in large-width $ ext{alpha}$-Stable NNs.

Abstract

There is a recent and growing literature on large-width asymptotic properties of Gaussian neural networks (NNs), namely NNs whose weights are initialized as Gaussian distributions. Two popular problems are: i) the study of the large-width distributions of NNs, which characterizes the infinitely wide limit of a rescaled NN in terms of a Gaussian stochastic process; ii) the study of the large-width training dynamics of NNs, which characterizes the infinitely wide dynamics in terms of a deterministic kernel, referred to as the neural tangent kernel (NTK), and shows that, for a sufficiently large width, the gradient descent achieves zero training error at a linear rate. In this paper, we consider these problems for $α$ -Stable NNs, namely NNs whose weights are initialized as $α$ -Stable distributions with $α \in (0, 2]$ . First, for $α$ -Stable NNs with a ReLU activation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Stochastic Gradient Optimization Techniques · Machine Learning and ELM

MethodsNeural Tangent Kernel