Infinitely wide limits for deep Stable neural networks: sub-linear, linear and super-linear activation functions
Alberto Bordino, Stefano Favaro, Sandra Fortini

TL;DR
This paper explores the behavior of deep neural networks with heavy-tailed Stable-distributed parameters as their width tends to infinity, extending previous Gaussian-based results to a broader class of activation functions using a generalized central limit theorem.
Contribution
It extends the characterization of infinitely wide limits of deep Stable neural networks to general activation functions under sequential growth, utilizing a generalized CLT for heavy tails.
Findings
Stable NN limits depend on activation function choice
Unified treatment of wide limits via heavy-tail CLT
Differences from Gaussian case in stability and scaling
Abstract
There is a growing literature on the study of large-width properties of deep Gaussian neural networks (NNs), i.e. deep NNs with Gaussian-distributed parameters or weights, and Gaussian stochastic processes. Motivated by some empirical and theoretical studies showing the potential of replacing Gaussian distributions with Stable distributions, namely distributions with heavy tails, in this paper we investigate large-width properties of deep Stable NNs, i.e. deep NNs with Stable-distributed parameters. For sub-linear activation functions, a recent work has characterized the infinitely wide limit of a suitable rescaled deep Stable NN in terms of a Stable stochastic process, both under the assumption of a ``joint growth" and under the assumption of a ``sequential growth" of the width over the NN's layers. Here, assuming a ``sequential growth" of the width, we extend such a characterization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and ELM · Neural Networks and Applications · Stochastic Gradient Optimization Techniques
