On the Selection of Initialization and Activation Function for Deep Neural Networks
Soufiane Hayou, Arnaud Doucet, Judith Rousseau

TL;DR
This paper analyzes how initialization and activation functions affect deep neural network training, showing that certain functions like Swish improve information propagation and training success at the edge of chaos.
Contribution
It provides a theoretical analysis of activation functions, identifying those that enhance information flow and training performance, including the Swish function.
Findings
ReLU-like functions propagate information deeper at the edge of chaos.
Swish activation improves information propagation over ReLU-like functions.
Using initialization at the edge of chaos benefits training success.
Abstract
The weight initialization and the activation function of deep neural networks have a crucial impact on the performance of the training procedure. An inappropriate selection can lead to the loss of information of the input during forward propagation and the exponential vanishing/exploding of gradients during back-propagation. Understanding the theoretical properties of untrained random networks is key to identifying which deep networks may be trained successfully as recently demonstrated by Schoenholz et al. (2017) who showed that for deep feedforward neural networks only a specific choice of hyperparameters known as the `edge of chaos' can lead to good performance. We complete this analysis by providing quantitative results showing that, for a class of ReLU-like activation functions, the information propagates indeed deeper for an initialization at the edge of chaos. By further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Gaussian Processes and Bayesian Inference · Advanced Neural Network Applications
MethodsSigmoid Activation · (FiLe@Against@Claim)How do I file a claim against Expedia?
