All you need is a good init

Dmytro Mishkin; Jiri Matas

arXiv:1511.06422·cs.LG·February 22, 2016·ICLR·207 cites

All you need is a good init

Dmytro Mishkin, Jiri Matas

PDF

Open Access 5 Repos

TL;DR

The paper introduces LSUV initialization, a simple two-step method involving orthonormal weights and variance normalization, which enables training very deep neural networks effectively and efficiently across various architectures and datasets.

Contribution

The paper presents LSUV initialization, a novel, straightforward method that improves deep network training by ensuring layer outputs have unit variance, matching or surpassing complex existing schemes.

Findings

01

Achieves state-of-the-art or near state-of-the-art results on MNIST, CIFAR, and ImageNet datasets.

02

Enables training of very deep networks with comparable or better accuracy and speed.

03

Performs well across different activation functions and architectures.

Abstract

Layer-sequential unit-variance (LSUV) initialization - a simple method for weight initialization for deep net learning - is proposed. The method consists of the two steps. First, pre-initialize weights of each convolution or inner-product layer with orthonormal matrices. Second, proceed from the first to the final layer, normalizing the variance of the output of each layer to be equal to one. Experiment with different activation functions (maxout, ReLU-family, tanh) show that the proposed initialization leads to learning of very deep nets that (i) produces networks with test accuracy better or equal to standard methods and (ii) is at least as fast as the complex schemes proposed specifically for very deep nets such as FitNets (Romero et al. (2015)) and Highway (Srivastava et al. (2015)). Performance is evaluated on GoogLeNet, CaffeNet, FitNets and Residual nets and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Brain Tumor Detection and Classification

MethodsLayer-Sequential Unit-Variance Initialization · 1x1 Convolution · Average Pooling · Local Response Normalization · Auxiliary Classifier · Inception Module · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling