Normalizing the Normalizers: Comparing and Extending Network Normalization Schemes
Mengye Ren, Renjie Liao, Raquel Urtasun, Fabian H. Sinz, Richard S., Zemel

TL;DR
This paper presents a unified divisive normalization framework that generalizes existing normalization techniques and introduces modifications with regularization, leading to improved performance across various neural network architectures and tasks.
Contribution
It unifies batch and layer normalization as divisive normalization and proposes modifications with regularization that enhance model performance.
Findings
Improved accuracy in image classification, language modeling, and super-resolution.
Unified framework simplifies understanding of normalization techniques.
Modified normalization schemes outperform standard methods.
Abstract
Normalization techniques have only recently begun to be exploited in supervised learning tasks. Batch normalization exploits mini-batch statistics to normalize the activations. This was shown to speed up training and result in better models. However its success has been very limited when dealing with recurrent neural networks. On the other hand, layer normalization normalizes the activations across all activities within a layer. This was shown to work well in the recurrent setting. In this paper we propose a unified view of normalization techniques, as forms of divisive normalization, which includes layer and batch normalization as special cases. Our second contribution is the finding that a small modification to these normalization schemes, in conjunction with a sparse regularizer on the activations, leads to significant benefits over standard normalization techniques. We demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Seismic Imaging and Inversion Techniques · Model Reduction and Neural Networks
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Layer Normalization · Batch Normalization
