The Hidden Power of Normalization Layers in Neural Networks: Exponential Capacity Control

Khoat Than

arXiv:2511.00958·cs.LG·February 24, 2026

The Hidden Power of Normalization Layers in Neural Networks: Exponential Capacity Control

Khoat Than

PDF

Open Access

TL;DR

This paper provides a theoretical explanation for how normalization layers in neural networks control capacity exponentially, leading to improved training stability and generalization, which explains their empirical success.

Contribution

It introduces a capacity control framework showing normalization layers exponentially reduce Lipschitz constants, explaining their role in optimization and generalization.

Findings

01

Normalization layers exponentially reduce Lipschitz constants.

02

They smooth the loss landscape exponentially.

03

They constrain network capacity, improving generalization.

Abstract

Normalization layers are critical components of modern AI systems, such as ChatGPT, Gemini, DeepSeek, etc. Empirically, they are known to stabilize training dynamics and improve generalization ability. However, the underlying theoretical mechanism by which normalization layers contribute to both optimization and generalization remains largely unexplained, especially when using many normalization layers in a deep neural network (DNN). In this work, we develop a theoretical framework that elucidates the role of normalization through the lens of capacity control. We prove that an unnormalized DNN can exhibit exponentially large Lipschitz constants with respect to either its parameters or inputs, implying excessive functional capacity and potential overfitting. Such bad DNNs are uncountably many. In contrast, the insertion of normalization layers provably can reduce the Lipschitz constant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning in Materials Science · Neural Networks and Reservoir Computing