If dropout limits trainable depth, does critical initialisation still   matter? A large-scale statistical analysis on ReLU networks

Arnu Pretorius; Elan van Biljon; Benjamin van Niekerk; Ryan Eloff,; Matthew Reynard; Steve James; Benjamin Rosman; Herman Kamper; Steve Kroon

arXiv:1910.05725·stat.ML·February 21, 2020

If dropout limits trainable depth, does critical initialisation still matter? A large-scale statistical analysis on ReLU networks

Arnu Pretorius, Elan van Biljon, Benjamin van Niekerk, Ryan Eloff,, Matthew Reynard, Steve James, Benjamin Rosman, Herman Kamper, Steve Kroon

PDF

TL;DR

This study investigates whether critical initialisation remains important for ReLU networks when dropout limits the trainable depth, finding that in shallow to moderate networks, initialisation type has minimal impact on performance.

Contribution

The paper provides a large-scale statistical analysis showing that critical initialisation does not significantly affect training or generalisation in networks with limited depth due to dropout.

Findings

01

No significant performance difference across various initialisations in limited-depth networks

02

Extreme initialisations perform worse than critical ones

03

Results also apply to standard ReLU networks of moderate depth

Abstract

Recent work in signal propagation theory has shown that dropout limits the depth to which information can propagate through a neural network. In this paper, we investigate the effect of initialisation on training speed and generalisation for ReLU networks within this depth limit. We ask the following research question: given that critical initialisation is crucial for training at large depth, if dropout limits the depth at which networks are trainable, does initialising critically still matter? We conduct a large-scale controlled experiment, and perform a statistical analysis of over $12000$ trained networks. We find that (1) trainable networks show no statistically significant difference in performance over a wide range of non-critical initialisations; (2) for initialisations that show a statistically significant difference, the net effect on performance is small; (3) only extreme…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout