How Does BN Increase Collapsed Neural Network Filters?
Sheng Zhou, Xinjiang Wang, Ping Luo, Litong Feng, Wenjie Li, Wei Zhang

TL;DR
This paper investigates how batch normalization causes filter collapse in deep neural networks, analyzes the underlying causes, and proposes a simple method called post-shifted BN to recover filters and improve performance.
Contribution
It reveals the harmful effect of BN on filter sparsity, provides an analytical explanation, and introduces psBN to prevent collapse and enhance model accuracy.
Findings
BN induces filter collapse even without explicit regularization.
High learning rates exacerbate filter sparsity caused by BN.
Post-shifted BN effectively recovers collapsed filters and improves performance.
Abstract
Improving sparsity of deep neural networks (DNNs) is essential for network compression and has drawn much attention. In this work, we disclose a harmful sparsifying process called filter collapse, which is common in DNNs with batch normalization (BN) and rectified linear activation functions (e.g. ReLU, Leaky ReLU). It occurs even without explicit sparsity-inducing regularizations such as . This phenomenon is caused by the normalization effect of BN, which induces a non-trainable region in the parameter space and reduces the network capacity as a result. This phenomenon becomes more prominent when the network is trained with large learning rates (LR) or adaptive LR schedulers, and when the network is finetuned. We analytically prove that the parameters of BN tend to become sparser during SGD updates with high gradient noise and that the sparsifying probability is proportional to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Sparse and Compressive Sensing Techniques
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Stochastic Gradient Descent
