MimicNorm: Weight Mean and Last BN Layer Mimic the Dynamic of Batch Normalization
Wen Fei, Wenrui Dai, Chenglin Li, Junni Zou, Hongkai Xiong

TL;DR
MimicNorm is a simplified normalization method that mimics Batch Normalization's effects using lightweight operations, reducing memory usage while maintaining accuracy and convergence benefits across various neural network architectures.
Contribution
The paper introduces MimicNorm, a novel normalization technique that replaces BN with weight mean operations and a final BN layer, improving efficiency and theoretical understanding.
Findings
Achieves similar accuracy to BN across multiple architectures
Reduces memory consumption by approximately 20%
Enhances convergence through theoretical analysis using NTK theory
Abstract
Substantial experiments have validated the success of Batch Normalization (BN) Layer in benefiting convergence and generalization. However, BN requires extra memory and float-point calculation. Moreover, BN would be inaccurate on micro-batch, as it depends on batch statistics. In this paper, we address these problems by simplifying BN regularization while keeping two fundamental impacts of BN layers, i.e., data decorrelation and adaptive learning rate. We propose a novel normalization method, named MimicNorm, to improve the convergence and efficiency in network training. MimicNorm consists of only two light operations, including modified weight mean operations (subtract mean values from weight parameter tensor) and one BN layer before loss function (last BN layer). We leverage the neural tangent kernel (NTK) theory to prove that our weight mean operation whitens activations and transits…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM
Methods1x1 Convolution · Grouped Convolution · Average Pooling · Residual Connection · Channel Shuffle · *Communicated@Fast*How Do I Communicate to Expedia? · Groupwise Point Convolution · Max Pooling · Pointwise Convolution · Softmax
