Random Walk Initialization for Training Very Deep Feedforward Networks

David Sussillo; L.F. Abbott

arXiv:1412.6558·cs.NE·March 3, 2015·70 cites

Random Walk Initialization for Training Very Deep Feedforward Networks

David Sussillo, L.F. Abbott

PDF

Open Access

TL;DR

This paper demonstrates that proper random walk scaling of initial weights in very deep feedforward networks can mitigate vanishing gradients, enabling effective training of extremely deep models through theoretical analysis and empirical validation.

Contribution

It introduces a novel random walk-based initialization method for deep feedforward networks, providing theoretical scaling laws and empirical evidence for improved training stability.

Findings

01

Gradient log-norm scales with the square root of depth

02

Increasing layer width reduces gradient variance

03

The method enables training of very deep networks

Abstract

Training very deep networks is an important open problem in machine learning. One of many difficulties is that the norm of the back-propagated error gradient can grow or decay exponentially. Here we show that training very deep feed-forward networks (FFNs) is not as difficult as previously thought. Unlike when back-propagation is applied to a recurrent network, application to an FFN amounts to multiplying the error gradient by a different random matrix at each layer. We show that the successive application of correctly scaled random matrices to an initial vector results in a random walk of the log of the norm of the resulting vectors, and we compute the scaling that makes this walk unbiased. The variance of the random walk grows only linearly with network depth and is inversely proportional to the size of each layer. Practically, this implies a gradient whose log-norm scales with the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM

Methods*Communicated@Fast*How Do I Communicate to Expedia?