The Geometric Cost of Normalization: Affine Bounds on the Bayesian Complexity of Neural Networks

Sungbae Chun

arXiv:2603.27432·cs.LG·March 31, 2026

The Geometric Cost of Normalization: Affine Bounds on the Bayesian Complexity of Neural Networks

Sungbae Chun

PDF

TL;DR

This paper investigates how different normalization techniques, LayerNorm and RMSNorm, impose geometric constraints affecting neural network complexity, with theoretical proofs and experimental validation of their effects on the Local Learning Coefficient.

Contribution

It provides a geometric analysis of LayerNorm and RMSNorm, quantifying their impact on model complexity through the Local Learning Coefficient, and introduces new theoretical insights and experimental validation.

Findings

01

LayerNorm reduces LLC by m/2 due to mean-centering.

02

RMSNorm preserves LLC by projecting onto a sphere.

03

Curvature determines LLC preservation or drop, with a sharp geometric threshold.

Abstract

LayerNorm and RMSNorm impose fundamentally different geometric constraints on their outputs - and this difference has a precise, quantifiable consequence for model complexity. We prove that LayerNorm's mean-centering step, by confining data to a linear hyperplane (through the origin), reduces the Local Learning Coefficient (LLC) of the subsequent weight matrix by exactly $m /2$ (where $m$ is its output dimension); RMSNorm's projection onto a sphere preserves the LLC entirely. This reduction is structurally guaranteed before any training begins, determined by data manifold geometry alone. The underlying condition is a geometric threshold: for the codimension-one manifolds we study, the LLC drop is binary -- any non-zero curvature, regardless of sign or magnitude, is sufficient to preserve the LLC, while only affinely flat manifolds cause the drop. At finite sample sizes this threshold…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.