Make Deep Networks Shallow Again
Bernhard Bermeitinger, Tomas Hrycej, Siegfried Handschuh

TL;DR
This paper demonstrates that shallow, parallel neural network architectures can match the performance of traditional deep, sequential networks, potentially simplifying design and training processes.
Contribution
It introduces a theoretical basis for replacing deep networks with shallow, parallel ones based on residual connection expansion and validates this with empirical results.
Findings
Shallow, parallel architectures perform similarly to deep, sequential networks on MNIST and CIFAR10.
Replacing deep networks with shallow ones can simplify architecture and training.
Theoretical expansion suggests truncating higher-order terms yields effective shallow models.
Abstract
Deep neural networks have a good success record and are thus viewed as the best architecture choice for complex applications. Their main shortcoming has been, for a long time, the vanishing gradient which prevented the numerical optimization algorithms from acceptable convergence. A breakthrough has been achieved by the concept of residual connections -- an identity mapping parallel to a conventional layer. This concept is applicable to stacks of layers of the same dimension and substantially alleviates the vanishing gradient problem. A stack of residual connection layers can be expressed as an expansion of terms similar to the Taylor expansion. This expansion suggests the possibility of truncating the higher-order terms and receiving an architecture consisting of a single broad layer composed of all initially stacked layers in parallel. In other words, a sequential deep architecture is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Image Enhancement Techniques
MethodsResidual Connection
