Comprehensive Online Network Pruning via Learnable Scaling Factors
Muhammad Umair Haider, Murtaza Taj

TL;DR
This paper introduces a unified, learnable gating method for both width-wise and depth-wise pruning of neural networks, enabling significant compression with minimal accuracy loss across various architectures.
Contribution
It presents a comprehensive pruning framework that adaptively prunes at multiple granularities using learnable gates, applicable to diverse neural network architectures.
Findings
Achieved 70-90% compression ratios without accuracy loss.
Applicable to various architectures without constraints.
Unified approach for width-wise and depth-wise pruning.
Abstract
One of the major challenges in deploying deep neural network architectures is their size which has an adverse effect on their inference time and memory requirements. Deep CNNs can either be pruned width-wise by removing filters based on their importance or depth-wise by removing layers and blocks. Width wise pruning (filter pruning) is commonly performed via learnable gates or switches and sparsity regularizers whereas pruning of layers has so far been performed arbitrarily by manually designing a smaller network usually referred to as a student network. We propose a comprehensive pruning strategy that can perform both width-wise as well as depth-wise pruning. This is achieved by introducing gates at different granularities (neuron, filter, layer, block) which are then controlled via an objective function that simultaneously performs pruning at different granularity during each forward…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPruning
