Root Mean Square Layer Normalization

Biao Zhang; Rico Sennrich

arXiv:1910.07467·cs.LG·October 17, 2019·106 cites

Root Mean Square Layer Normalization

Biao Zhang, Rico Sennrich

PDF

Open Access 5 Repos 10 Models

TL;DR

This paper introduces RMSNorm, a simplified and more efficient alternative to LayerNorm that maintains similar performance while significantly reducing computational overhead in neural networks.

Contribution

The paper proposes RMSNorm, a novel normalization method that replaces re-centering with RMS-based scaling, offering computational efficiency and comparable accuracy.

Findings

01

RMSNorm reduces training time by up to 64%.

02

RMSNorm achieves similar performance to LayerNorm across tasks.

03

Partial RMSNorm (pRMSNorm) maintains properties with less computation.

Abstract

Layer normalization (LayerNorm) has been successfully applied to various deep neural networks to help stabilize training and boost model convergence because of its capability in handling re-centering and re-scaling of both inputs and weight matrix. However, the computational overhead introduced by LayerNorm makes these improvements expensive and significantly slows the underlying network, e.g. RNN in particular. In this paper, we hypothesize that re-centering invariance in LayerNorm is dispensable and propose root mean square layer normalization, or RMSNorm. RMSNorm regularizes the summed inputs to a neuron in one layer according to root mean square (RMS), giving the model re-scaling invariance property and implicit learning rate adaptation ability. RMSNorm is computationally simpler and thus more efficient than LayerNorm. We also present partial RMSNorm, or pRMSNorm where the RMS is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Algorithms and Data Compression · Machine Learning and Algorithms

MethodsRoot Mean Square Layer Normalization