Geometric Interpretation of Layer Normalization and a Comparative Analysis with RMSNorm
Akshat Gupta, Atahan Ozdemir, Gopala Anumanchipalli

TL;DR
This paper offers a geometric perspective on LayerNorm, revealing its connection to the uniform vector, and demonstrates that RMSNorm can replace LayerNorm for improved efficiency without loss of performance.
Contribution
It provides a new geometric interpretation of LayerNorm, compares it with RMSNorm, and shows that removing the uniform vector component is redundant, advocating for RMSNorm use.
Findings
LayerNorm is linked to the uniform vector in representation space.
Standardization in LayerNorm involves removing the uniform component, normalizing, and scaling.
LLMs operate orthogonal to the uniform vector during inference, making removal redundant.
Abstract
This paper presents a novel geometric interpretation of LayerNorm and explores how LayerNorm influences the norm and orientation of hidden vectors in the representation space. With these geometric insights, we prepare the foundation for comparing LayerNorm with RMSNorm. We show that the definition of LayerNorm is innately linked to the uniform vector, defined as . We then show that the standardization step in LayerNorm can be understood in three simple steps: (i) remove the component of a vector along the uniform vector, (ii) normalize the remaining vector, and (iii) scale the resultant vector by , where is the dimensionality of the representation space. We also provide additional insights into how LayerNorm operates at inference time. Finally, we compare the hidden representations of LayerNorm-based LLMs with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Explainable Artificial Intelligence (XAI) · Topic Modeling
MethodsRoot Mean Square Layer Normalization · ALIGN
