Cascade Weight Shedding in Deep Neural Networks: Benefits and Pitfalls for Network Pruning
Kambiz Azarian, Fatih Porikli

TL;DR
This paper investigates the cascade weight shedding phenomenon in deep neural networks, revealing its impact on pruning effectiveness, and proposes methods to leverage it for improved performance and reduced complexity.
Contribution
It introduces the concept of cascade weight shedding, explains its role in network pruning, and explores its benefits for enhancing pruning methods like GMP and for semi-structured pruning.
Findings
Cascade weight shedding can significantly improve pruning performance.
Global magnitude-based pruning remains competitive across scenarios.
Rewinding methods relate to cascade shedding and offer advantages.
Abstract
We report, for the first time, on the cascade weight shedding phenomenon in deep neural networks where in response to pruning a small percentage of a network's weights, a large percentage of the remaining is shed over a few epochs during the ensuing fine-tuning phase. We show that cascade weight shedding, when present, can significantly improve the performance of an otherwise sub-optimal scheme such as random pruning. This explains why some pruning methods may perform well under certain circumstances, but poorly under others, e.g., ResNet50 vs. MobileNetV3. We provide insight into why the global magnitude-based pruning, i.e., GMP, despite its simplicity, provides a competitive performance for a wide range of scenarios. We also demonstrate cascade weight shedding's potential for improving GMP's accuracy, and reduce its computational complexity. In doing so, we highlight the importance of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Anomaly Detection Techniques and Applications · Adversarial Robustness in Machine Learning
MethodsPruning · ReLU6 · Depthwise Convolution · Pointwise Convolution · Sigmoid Activation · Depthwise Separable Convolution · Dense Connections · Hard Swish · Average Pooling · Batch Normalization
