Layer Normalization
Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton

TL;DR
Layer normalization normalizes neuron activities within a layer for each individual training case, improving training stability and speed, especially in recurrent neural networks, by addressing batch normalization limitations.
Contribution
This paper introduces layer normalization, a new normalization technique that normalizes across features within a layer for each case, applicable to recurrent networks and consistent during training and testing.
Findings
Reduces training time significantly compared to previous methods
Stabilizes hidden state dynamics in recurrent neural networks
Applicable to both feed-forward and recurrent architectures
Abstract
Training state-of-the-art, deep neural networks is computationally expensive. One way to reduce the training time is to normalize the activities of the neurons. A recently introduced technique called batch normalization uses the distribution of the summed input to a neuron over a mini-batch of training cases to compute a mean and variance which are then used to normalize the summed input to that neuron on each training case. This significantly reduces the training time in feed-forward neural networks. However, the effect of batch normalization is dependent on the mini-batch size and it is not obvious how to apply it to recurrent neural networks. In this paper, we transpose batch normalization into layer normalization by computing the mean and variance used for normalization from all of the summed inputs to the neurons in a layer on a single training case. Like batch normalization, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗refactai/Refact-1_6-basemodel· 199 dl· ♡ 5199 dl♡ 5
- 🤗refactai/Refact-1_6B-fimmodel· 8.7k dl· ♡ 1418.7k dl♡ 141
- 🤗stabilityai/stablelm-3b-4e1tmodel· 35k dl· ♡ 31235k dl♡ 312
- 🤗stabilityai/japanese-stablelm-3b-4e1t-basemodel· 84 dl· ♡ 1884 dl♡ 18
- 🤗stabilityai/japanese-stablelm-3b-4e1t-instructmodel· 63 dl· ♡ 2963 dl♡ 29
- 🤗cxllin/StableMed-3bmodel· 8 dl· ♡ 38 dl♡ 3
- 🤗afrideva/stablelm-3b-4e1t-GGUFmodel· 2.0k dl· ♡ 12.0k dl♡ 1
- 🤗maddes8cht/stabilityai-stablelm-3b-4e1t-ggufmodel· 314 dl· ♡ 4314 dl♡ 4
- 🤗maddes8cht/stabilityai-japanese-stablelm-3b-4e1t-base-ggufmodel· 154 dl154 dl
- 🤗maddes8cht/stabilityai-japanese-stablelm-3b-4e1t-instruct-ggufmodel· 103 dl103 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Machine Learning in Materials Science · Machine Learning and ELM
MethodsLayer Normalization · Batch Normalization
