All you need is a good init
Dmytro Mishkin, Jiri Matas

TL;DR
The paper introduces LSUV initialization, a simple two-step method involving orthonormal weights and variance normalization, which enables training very deep neural networks effectively and efficiently across various architectures and datasets.
Contribution
The paper presents LSUV initialization, a novel, straightforward method that improves deep network training by ensuring layer outputs have unit variance, matching or surpassing complex existing schemes.
Findings
Achieves state-of-the-art or near state-of-the-art results on MNIST, CIFAR, and ImageNet datasets.
Enables training of very deep networks with comparable or better accuracy and speed.
Performs well across different activation functions and architectures.
Abstract
Layer-sequential unit-variance (LSUV) initialization - a simple method for weight initialization for deep net learning - is proposed. The method consists of the two steps. First, pre-initialize weights of each convolution or inner-product layer with orthonormal matrices. Second, proceed from the first to the final layer, normalizing the variance of the output of each layer to be equal to one. Experiment with different activation functions (maxout, ReLU-family, tanh) show that the proposed initialization leads to learning of very deep nets that (i) produces networks with test accuracy better or equal to standard methods and (ii) is at least as fast as the complex schemes proposed specifically for very deep nets such as FitNets (Romero et al. (2015)) and Highway (Srivastava et al. (2015)). Performance is evaluated on GoogLeNet, CaffeNet, FitNets and Residual nets and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Brain Tumor Detection and Classification
MethodsLayer-Sequential Unit-Variance Initialization · 1x1 Convolution · Average Pooling · Local Response Normalization · Auxiliary Classifier · Inception Module · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling
