Resurrecting the sigmoid in deep learning through dynamical isometry:   theory and practice

Jeffrey Pennington; Samuel S. Schoenholz; Surya Ganguli

arXiv:1711.04735·cs.LG·November 15, 2017·69 cites

Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice

Jeffrey Pennington, Samuel S. Schoenholz, Surya Ganguli

PDF

Open Access

TL;DR

This paper uses free probability theory to analyze how weight initialization and nonlinearities affect the singular value distribution of Jacobians in deep networks, revealing that sigmoidal networks can achieve dynamical isometry and learn faster than ReLU networks.

Contribution

It extends the concept of dynamical isometry to deep nonlinear networks using free probability, showing sigmoidal networks can achieve it with orthogonal initialization and improve learning speed.

Findings

01

ReLU networks cannot achieve dynamical isometry.

02

Sigmoidal networks can achieve isometry with orthogonal initialization.

03

Dynamically isometric networks learn significantly faster.

Abstract

It is well known that the initialization of weights in deep neural networks can have a dramatic impact on learning speed. For example, ensuring the mean squared singular value of a network's input-output Jacobian is $O (1)$ is essential for avoiding the exponential vanishing or explosion of gradients. The stronger condition that all singular values of the Jacobian concentrate near $1$ is a property known as dynamical isometry. For deep linear networks, dynamical isometry can be achieved through orthogonal weight initialization and has been shown to dramatically speed up learning; however, it has remained unclear how to extend these results to the nonlinear setting. We address this question by employing powerful tools from free probability theory to compute analytically the entire singular value distribution of a deep network's input-output Jacobian. We explore the dependence of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Neural Networks and Applications · Gaussian Processes and Bayesian Inference

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · *Communicated@Fast*How Do I Communicate to Expedia?