Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks
Tim Salimans, Diederik P. Kingma

TL;DR
Weight normalization reparameterizes neural network weights to decouple their length from direction, improving optimization conditioning and speeding up training without batch dependencies, applicable to various models including RNNs and reinforcement learning.
Contribution
Introduces weight normalization as a simple, effective reparameterization technique that accelerates training across diverse neural network architectures without batch normalization dependencies.
Findings
Speeds up convergence of stochastic gradient descent.
Applicable to recurrent models and noise-sensitive applications.
Reduces computational overhead compared to batch normalization.
Abstract
We present weight normalization: a reparameterization of the weight vectors in a neural network that decouples the length of those weight vectors from their direction. By reparameterizing the weights in this way we improve the conditioning of the optimization problem and we speed up convergence of stochastic gradient descent. Our reparameterization is inspired by batch normalization but does not introduce any dependencies between the examples in a minibatch. This means that our method can also be applied successfully to recurrent models such as LSTMs and to noise-sensitive applications such as deep reinforcement learning or generative models, for which batch normalization is less well suited. Although our method is much simpler, it still provides much of the speed-up of full batch normalization. In addition, the computational overhead of our method is lower, permitting more optimization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Weight Normalization · Batch Normalization
