Skip Connections Eliminate Singularities
A. Emin Orhan, Xaq Pitkow

TL;DR
This paper proposes that skip connections improve training of deep networks by eliminating singularities in the loss landscape caused by node permutation, elimination, and linear dependence, thus facilitating learning.
Contribution
It introduces a novel explanation that skip connections break certain singularities, supported by theoretical analysis and experiments on real-world datasets.
Findings
Skip connections break permutation symmetry of nodes.
They reduce the likelihood of node elimination.
They lessen linear dependence among nodes.
Abstract
Skip connections made the training of very deep networks possible and have become an indispensable component in a variety of neural architectures. A completely satisfactory explanation for their success remains elusive. Here, we present a novel explanation for the benefits of skip connections in training very deep networks. The difficulty of training deep networks is partly due to the singularities caused by the non-identifiability of the model. Several such singularities have been identified in previous works: (i) overlap singularities caused by the permutation symmetry of nodes in a given layer, (ii) elimination singularities corresponding to the elimination, i.e. consistent deactivation, of nodes, (iii) singularities generated by the linear dependence of the nodes. These singularities cause degenerate manifolds in the loss landscape that slow down learning. We argue that skip…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
