Parallel Layer Normalization for Universal Approximation

Yunhao Ni; Yuxin Guo; Yuhe Liu; Wenxin Sun; Jie Luo; Wenjun Wu; Lei Huang

arXiv:2505.13142·cs.LG·February 10, 2026

Parallel Layer Normalization for Universal Approximation

Yunhao Ni, Yuxin Guo, Yuhe Liu, Wenxin Sun, Jie Luo, Wenjun Wu, Lei Huang

PDF

Open Access

TL;DR

This paper demonstrates that neural networks with parallel layer normalization (PLN) layers can universally approximate functions, surpassing standard LN networks, with theoretical analysis and empirical validation across various architectures.

Contribution

It introduces PLN-Nets, a novel architecture that achieves universal approximation, and extends the analysis to RMSNorm and complex models like Transformers.

Findings

01

PLN-Nets achieve universal approximation.

02

Analysis of approximation rates in different norms.

03

Empirical evidence supports PLN-Nets' potential.

Abstract

This paper studies the approximation capabilities of neural networks that combine layer normalization (LN) with linear layers. We prove that networks consisting of two linear layers with parallel layer normalizations (PLNs) inserted between them (referred to as PLN-Nets) achieve universal approximation, whereas architectures that use only standard LN exhibit strictly limited expressive power.We further analyze approximation rates of shallow and deep PLN-Nets under the $L^{\infty}$ norm as well as in Sobolev norms. Our analysis extends beyond LN to RMSNorm, and from standard MLPs to position-wise feed-forward networks, the core building blocks used in RNNs and Transformers.Finally, we provide empirical experiments to explore other possible potentials of PLN-Nets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Advanced Graph Neural Networks

MethodsLayer Normalization