If dropout limits trainable depth, does critical initialisation still matter? A large-scale statistical analysis on ReLU networks
Arnu Pretorius, Elan van Biljon, Benjamin van Niekerk, Ryan Eloff,, Matthew Reynard, Steve James, Benjamin Rosman, Herman Kamper, Steve Kroon

TL;DR
This study investigates whether critical initialisation remains important for ReLU networks when dropout limits the trainable depth, finding that in shallow to moderate networks, initialisation type has minimal impact on performance.
Contribution
The paper provides a large-scale statistical analysis showing that critical initialisation does not significantly affect training or generalisation in networks with limited depth due to dropout.
Findings
No significant performance difference across various initialisations in limited-depth networks
Extreme initialisations perform worse than critical ones
Results also apply to standard ReLU networks of moderate depth
Abstract
Recent work in signal propagation theory has shown that dropout limits the depth to which information can propagate through a neural network. In this paper, we investigate the effect of initialisation on training speed and generalisation for ReLU networks within this depth limit. We ask the following research question: given that critical initialisation is crucial for training at large depth, if dropout limits the depth at which networks are trainable, does initialising critically still matter? We conduct a large-scale controlled experiment, and perform a statistical analysis of over trained networks. We find that (1) trainable networks show no statistically significant difference in performance over a wide range of non-critical initialisations; (2) for initialisations that show a statistically significant difference, the net effect on performance is small; (3) only extreme…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout
