On the Impact of the Activation Function on Deep Neural Networks Training
Soufiane Hayou, Arnaud Doucet, Judith Rousseau

TL;DR
This paper analyzes how the choice of activation function and initialization parameters affects the training efficiency and performance of deep neural networks, emphasizing the importance of the 'Edge of Chaos' for successful training.
Contribution
It provides a comprehensive theoretical analysis showing how tuning initialization and activation functions can accelerate training and enhance deep neural network performance.
Findings
Proper tuning of initialization accelerates training
Activation function choice impacts network performance
Edge of Chaos is critical for trainability
Abstract
The weight initialization and the activation function of deep neural networks have a crucial impact on the performance of the training procedure. An inappropriate selection can lead to the loss of information of the input during forward propagation and the exponential vanishing/exploding of gradients during back-propagation. Understanding the theoretical properties of untrained random networks is key to identifying which deep networks may be trained successfully as recently demonstrated by Samuel et al (2017) who showed that for deep feedforward neural networks only a specific choice of hyperparameters known as the `Edge of Chaos' can lead to good performance. While the work by Samuel et al (2017) discuss trainability issues, we focus here on training acceleration and overall performance. We give a comprehensive theoretical analysis of the Edge of Chaos and show that we can indeed tune…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
