Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks
Lechao Xiao, Yasaman Bahri, Jascha Sohl-Dickstein, Samuel S., Schoenholz, and Jeffrey Pennington

TL;DR
This paper shows that extremely deep vanilla CNNs with over 10,000 layers can be trained effectively using a theoretically derived orthogonal initialization scheme that ensures signal propagation and dynamical isometry.
Contribution
It introduces a mean field theory for CNN initialization, deriving conditions for dynamical isometry, and provides an algorithm for orthogonal convolution kernel generation.
Findings
Orthogonal initialization enables training of 10,000+ layer CNNs.
Theoretical conditions for signal propagation and dynamical isometry are established.
Empirical results confirm efficient training with the proposed initialization.
Abstract
In recent years, state-of-the-art methods in computer vision have utilized increasingly deep convolutional neural network architectures (CNNs), with some of the most successful models employing hundreds or even thousands of layers. A variety of pathologies such as vanishing/exploding gradients make training such deep networks challenging. While residual connections and batch normalization do enable training at these depths, it has remained unclear whether such specialized architecture designs are truly necessary to train deep CNNs. In this work, we demonstrate that it is possible to train vanilla CNNs with ten thousand layers or more simply by using an appropriate initialization scheme. We derive this initialization scheme theoretically by developing a mean field theory for signal propagation and by characterizing the conditions for dynamical isometry, the equilibration of singular…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications · Human Pose and Action Recognition
MethodsConvolution
