MimicNorm: Weight Mean and Last BN Layer Mimic the Dynamic of Batch   Normalization

Wen Fei; Wenrui Dai; Chenglin Li; Junni Zou; Hongkai Xiong

arXiv:2010.09278·cs.LG·October 30, 2024·1 cites

MimicNorm: Weight Mean and Last BN Layer Mimic the Dynamic of Batch Normalization

Wen Fei, Wenrui Dai, Chenglin Li, Junni Zou, Hongkai Xiong

PDF

Open Access 1 Repo

TL;DR

MimicNorm is a simplified normalization method that mimics Batch Normalization's effects using lightweight operations, reducing memory usage while maintaining accuracy and convergence benefits across various neural network architectures.

Contribution

The paper introduces MimicNorm, a novel normalization technique that replaces BN with weight mean operations and a final BN layer, improving efficiency and theoretical understanding.

Findings

01

Achieves similar accuracy to BN across multiple architectures

02

Reduces memory consumption by approximately 20%

03

Enhances convergence through theoretical analysis using NTK theory

Abstract

Substantial experiments have validated the success of Batch Normalization (BN) Layer in benefiting convergence and generalization. However, BN requires extra memory and float-point calculation. Moreover, BN would be inaccurate on micro-batch, as it depends on batch statistics. In this paper, we address these problems by simplifying BN regularization while keeping two fundamental impacts of BN layers, i.e., data decorrelation and adaptive learning rate. We propose a novel normalization method, named MimicNorm, to improve the convergence and efficiency in network training. MimicNorm consists of only two light operations, including modified weight mean operations (subtract mean values from weight parameter tensor) and one BN layer before loss function (last BN layer). We leverage the neural tangent kernel (NTK) theory to prove that our weight mean operation whitens activations and transits…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Kid-key/MimicNorm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM

Methods1x1 Convolution · Grouped Convolution · Average Pooling · Residual Connection · Channel Shuffle · *Communicated@Fast*How Do I Communicate to Expedia? · Groupwise Point Convolution · Max Pooling · Pointwise Convolution · Softmax