On the Nonlinearity of Layer Normalization

Yunhao Ni; Yuxin Guo; Junlong Jia; Lei Huang

arXiv:2406.01255·cs.LG·June 4, 2024

On the Nonlinearity of Layer Normalization

Yunhao Ni, Yuxin Guo, Junlong Jia, Lei Huang

PDF

Open Access

TL;DR

This paper provides a theoretical analysis of layer normalization's nonlinearity, demonstrating its capacity to enhance neural network expressiveness and proposing architecture design strategies that exploit this property.

Contribution

It introduces a theoretical framework for understanding LN's nonlinearity and representation capacity, including bounds on VC dimension and methods to amplify nonlinearity.

Findings

01

LN-Net with 3 neurons per layer can classify any m samples.

02

Lower bound of VC dimension for LN-Net established.

03

Amplifying LN nonlinearity improves neural network expressiveness.

Abstract

Layer normalization (LN) is a ubiquitous technique in deep learning but our theoretical understanding to it remains elusive. This paper investigates a new theoretical direction for LN, regarding to its nonlinearity and representation capacity. We investigate the representation capacity of a network with layerwise composition of linear and LN transformations, referred to as LN-Net. We theoretically show that, given $m$ samples with any label assignment, an LN-Net with only 3 neurons in each layer and $O (m)$ LN layers can correctly classify them. We further show the lower bound of the VC dimension of an LN-Net. The nonlinearity of LN can be amplified by group partition, which is also theoretically demonstrated with mild assumption and empirically supported by our experiments. Based on our analyses, we consider to design neural architecture by exploiting and amplifying the nonlinearity of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsControl Systems and Identification