Weight Normalization: A Simple Reparameterization to Accelerate Training   of Deep Neural Networks

Tim Salimans; Diederik P. Kingma

arXiv:1602.07868·cs.LG·June 7, 2016·929 cites

Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks

Tim Salimans, Diederik P. Kingma

PDF

Open Access 5 Repos

TL;DR

Weight normalization reparameterizes neural network weights to decouple their length from direction, improving optimization conditioning and speeding up training without batch dependencies, applicable to various models including RNNs and reinforcement learning.

Contribution

Introduces weight normalization as a simple, effective reparameterization technique that accelerates training across diverse neural network architectures without batch normalization dependencies.

Findings

01

Speeds up convergence of stochastic gradient descent.

02

Applicable to recurrent models and noise-sensitive applications.

03

Reduces computational overhead compared to batch normalization.

Abstract

We present weight normalization: a reparameterization of the weight vectors in a neural network that decouples the length of those weight vectors from their direction. By reparameterizing the weights in this way we improve the conditioning of the optimization problem and we speed up convergence of stochastic gradient descent. Our reparameterization is inspired by batch normalization but does not introduce any dependencies between the examples in a minibatch. This means that our method can also be applied successfully to recurrent models such as LSTMs and to noise-sensitive applications such as deep reinforcement learning or generative models, for which batch normalization is less well suited. Although our method is much simpler, it still provides much of the speed-up of full batch normalization. In addition, the computational overhead of our method is lower, permitting more optimization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Weight Normalization · Batch Normalization