Layer Normalization

Jimmy Lei Ba; Jamie Ryan Kiros; Geoffrey E. Hinton

arXiv:1607.06450·stat.ML·July 22, 2016·498 cites

Layer Normalization

Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton

PDF

Open Access 5 Repos 10 Models

TL;DR

Layer normalization normalizes neuron activities within a layer for each individual training case, improving training stability and speed, especially in recurrent neural networks, by addressing batch normalization limitations.

Contribution

This paper introduces layer normalization, a new normalization technique that normalizes across features within a layer for each case, applicable to recurrent networks and consistent during training and testing.

Findings

01

Reduces training time significantly compared to previous methods

02

Stabilizes hidden state dynamics in recurrent neural networks

03

Applicable to both feed-forward and recurrent architectures

Abstract

Training state-of-the-art, deep neural networks is computationally expensive. One way to reduce the training time is to normalize the activities of the neurons. A recently introduced technique called batch normalization uses the distribution of the summed input to a neuron over a mini-batch of training cases to compute a mean and variance which are then used to normalize the summed input to that neuron on each training case. This significantly reduces the training time in feed-forward neural networks. However, the effect of batch normalization is dependent on the mini-batch size and it is not obvious how to apply it to recurrent neural networks. In this paper, we transpose batch normalization into layer normalization by computing the mean and variance used for normalization from all of the summed inputs to the neurons in a layer on a single training case. Like batch normalization, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Machine Learning in Materials Science · Machine Learning and ELM

MethodsLayer Normalization · Batch Normalization