Functional Large Deviations for Wide Deep Neural Networks with Gaussian Initialization and Lipschitz Activations
Claudio Macci, Barbara Pacchiarotti, Katerina Papagiannouli, Giovanni Luca Torrisi, Dario Trevisan

TL;DR
This paper proves a large deviation principle for the output processes of wide deep neural networks with Gaussian weights and Lipschitz activations, including ReLU, on any compact input set.
Contribution
It extends large deviation results to entire network output processes with general Lipschitz activations and arbitrary compact input sets, beyond previous finite-input restrictions.
Findings
Large deviation principle established for network outputs
Applicable to ReLU and other Lipschitz activations
Extends results to infinite input sets
Abstract
We establish a functional large deviation principle for fully connected multi-layer perceptrons with i.i.d. Gaussian weights (LeCun initialization) and general Lipschitz activation functions, including therefore the popular case of ReLU. The large deviation principle holds for the entire network output process on any compact input set. The proof combines exponential tightness for recursively defined processes, finite-dimensional large deviations, and the Dawson-G\"artner theorem, extending existing results beyond finite input sets and less general activations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Neural Networks and Applications
