Infinitely wide limits for deep Stable neural networks: sub-linear,   linear and super-linear activation functions

Alberto Bordino; Stefano Favaro; Sandra Fortini

arXiv:2304.04008·cs.LG·April 11, 2023·1 cites

Infinitely wide limits for deep Stable neural networks: sub-linear, linear and super-linear activation functions

Alberto Bordino, Stefano Favaro, Sandra Fortini

PDF

Open Access

TL;DR

This paper explores the behavior of deep neural networks with heavy-tailed Stable-distributed parameters as their width tends to infinity, extending previous Gaussian-based results to a broader class of activation functions using a generalized central limit theorem.

Contribution

It extends the characterization of infinitely wide limits of deep Stable neural networks to general activation functions under sequential growth, utilizing a generalized CLT for heavy tails.

Findings

01

Stable NN limits depend on activation function choice

02

Unified treatment of wide limits via heavy-tail CLT

03

Differences from Gaussian case in stability and scaling

Abstract

There is a growing literature on the study of large-width properties of deep Gaussian neural networks (NNs), i.e. deep NNs with Gaussian-distributed parameters or weights, and Gaussian stochastic processes. Motivated by some empirical and theoretical studies showing the potential of replacing Gaussian distributions with Stable distributions, namely distributions with heavy tails, in this paper we investigate large-width properties of deep Stable NNs, i.e. deep NNs with Stable-distributed parameters. For sub-linear activation functions, a recent work has characterized the infinitely wide limit of a suitable rescaled deep Stable NN in terms of a Stable stochastic process, both under the assumption of a ``joint growth" and under the assumption of a ``sequential growth" of the width over the NN's layers. Here, assuming a ``sequential growth" of the width, we extend such a characterization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and ELM · Neural Networks and Applications · Stochastic Gradient Optimization Techniques