Root Mean Square Layer Normalization
Biao Zhang, Rico Sennrich

TL;DR
This paper introduces RMSNorm, a simplified and more efficient alternative to LayerNorm that maintains similar performance while significantly reducing computational overhead in neural networks.
Contribution
The paper proposes RMSNorm, a novel normalization method that replaces re-centering with RMS-based scaling, offering computational efficiency and comparable accuracy.
Findings
RMSNorm reduces training time by up to 64%.
RMSNorm achieves similar performance to LayerNorm across tasks.
Partial RMSNorm (pRMSNorm) maintains properties with less computation.
Abstract
Layer normalization (LayerNorm) has been successfully applied to various deep neural networks to help stabilize training and boost model convergence because of its capability in handling re-centering and re-scaling of both inputs and weight matrix. However, the computational overhead introduced by LayerNorm makes these improvements expensive and significantly slows the underlying network, e.g. RNN in particular. In this paper, we hypothesize that re-centering invariance in LayerNorm is dispensable and propose root mean square layer normalization, or RMSNorm. RMSNorm regularizes the summed inputs to a neuron in one layer according to root mean square (RMS), giving the model re-scaling invariance property and implicit learning rate adaptation ability. RMSNorm is computationally simpler and thus more efficient than LayerNorm. We also present partial RMSNorm, or pRMSNorm where the RMS is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- MindCode-4/code-9/tree/main/root-mean-square-layer-normalizationmindspore
- MindSpore-scientific/code-14/tree/main/root-mean-square-layer-normalizationmindspore
- MindCode-4/code-8/tree/main/root-mean-square-layer-normalizationmindspore
- MindSpore-scientific/code-5/tree/main/root-mean-square-layer-normalizationmindspore
- mirage-project/miragepytorch
- 🤗google/gemma-scope-2b-pt-transcodersmodel· ♡ 13♡ 13
- 🤗baichuan-inc/Baichuan-7Bmodel· 58k dl· ♡ 84258k dl♡ 842
- 🤗fireballoon/baichuan-llama-7bmodel· 39 dl· ♡ 2339 dl♡ 23
- 🤗TheBloke/baichuan-7B-GPTQmodel· 8 dl· ♡ 148 dl♡ 14
- 🤗TheBloke/baichuan-llama-7B-GGMLmodel· ♡ 11♡ 11
- 🤗TheBloke/baichuan-llama-7B-GPTQmodel· 3 dl· ♡ 73 dl♡ 7
- 🤗HuggingFaceM4/idefics-80bmodel· 331 dl· ♡ 69331 dl♡ 69
- 🤗sharpbai/Baichuan-7Bmodel· 7 dl7 dl
- 🤗HuggingFaceM4/idefics-9bmodel· 1.9k dl· ♡ 471.9k dl♡ 47
- 🤗refactai/Refact-1_6-basemodel· 199 dl· ♡ 5199 dl♡ 5
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Algorithms and Data Compression · Machine Learning and Algorithms
MethodsRoot Mean Square Layer Normalization
