Random Walk Initialization for Training Very Deep Feedforward Networks
David Sussillo, L.F. Abbott

TL;DR
This paper demonstrates that proper random walk scaling of initial weights in very deep feedforward networks can mitigate vanishing gradients, enabling effective training of extremely deep models through theoretical analysis and empirical validation.
Contribution
It introduces a novel random walk-based initialization method for deep feedforward networks, providing theoretical scaling laws and empirical evidence for improved training stability.
Findings
Gradient log-norm scales with the square root of depth
Increasing layer width reduces gradient variance
The method enables training of very deep networks
Abstract
Training very deep networks is an important open problem in machine learning. One of many difficulties is that the norm of the back-propagated error gradient can grow or decay exponentially. Here we show that training very deep feed-forward networks (FFNs) is not as difficult as previously thought. Unlike when back-propagation is applied to a recurrent network, application to an FFN amounts to multiplying the error gradient by a different random matrix at each layer. We show that the successive application of correctly scaled random matrices to an initial vector results in a random walk of the log of the norm of the resulting vectors, and we compute the scaling that makes this walk unbiased. The variance of the random walk grows only linearly with network depth and is inversely proportional to the size of each layer. Practically, this implies a gradient whose log-norm scales with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM
Methods*Communicated@Fast*How Do I Communicate to Expedia?
