
TL;DR
This paper provides a geometric and mathematical analysis of LayerNorm, revealing its effects as a composition of linear, nonlinear, and affine transformations, and characterizing its output space as an intersection of a hyperplane and hyperellipsoid.
Contribution
It introduces a new mathematical expression and geometric intuition for LayerNorm, clarifying its operation and output geometry in neural networks.
Findings
LayerNorm outputs lie within the intersection of a hyperplane and hyperellipsoid.
Outputs are typically mapped near the surface of the hyperellipsoid.
Eigen-decomposition reveals principal axes of the hyperellipsoid.
Abstract
A technical note aiming to offer deeper intuition for the LayerNorm function common in deep neural networks. LayerNorm is defined relative to a distinguished 'neural' basis, but it does more than just normalize the corresponding vector elements. Rather, it implements a composition -- of linear projection, nonlinear scaling, and then affine transformation -- on input activation vectors. We develop both a new mathematical expression and geometric intuition, to make the net effect more transparent. We emphasize that, when LayerNorm acts on an N-dimensional vector space, all outcomes of LayerNorm lie within the intersection of an (N-1)-dimensional hyperplane and the interior of an N-dimensional hyperellipsoid. This intersection is the interior of an (N-1)-dimensional hyperellipsoid, and typical inputs are mapped near its surface. We find the direction and length of the principal axes of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Advanced Neural Network Applications
