Deep Learning without Shortcuts: Shaping the Kernel with Tailored Rectifiers
Guodong Zhang, Aleksandar Botev, James Martens

TL;DR
This paper introduces a novel transformation compatible with Leaky ReLUs that enables training deep vanilla neural networks effectively, achieving competitive accuracy to ResNets without shortcut connections or normalization layers.
Contribution
The authors develop a new transformation method that improves deep vanilla network training with Leaky ReLUs, surpassing previous methods like EOC in accuracy and depth scalability.
Findings
Deep vanilla networks trained with the new method achieve high validation accuracy.
The method is computationally efficient and compatible with Leaky ReLUs.
Validation accuracy remains stable or improves with increasing depth.
Abstract
Training very deep neural networks is still an extremely challenging task. The common solution is to use shortcut connections and normalization layers, which are both crucial ingredients in the popular ResNet architecture. However, there is strong evidence to suggest that ResNets behave more like ensembles of shallower networks than truly deep ones. Recently, it was shown that deep vanilla networks (i.e. networks without normalization layers or shortcut connections) can be trained as fast as ResNets by applying certain transformations to their activation functions. However, this method (called Deep Kernel Shaping) isn't fully compatible with ReLUs, and produces networks that overfit significantly more than ResNets on ImageNet. In this work, we rectify this situation by developing a new type of transformation that is fully compatible with a variant of ReLUs -- Leaky ReLUs. We show in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Residual Connection · 1x1 Convolution · Average Pooling · Bottleneck Residual Block · Kaiming Initialization · Max Pooling · Residual Block · Global Average Pooling
