Channel-wise pruning of neural networks with tapering resource constraint
Alexey Kruglov

TL;DR
This paper introduces a novel compute-constrained channel-wise pruning method for convolutional neural networks that uses a holonomic constraint and automatic control to efficiently reduce model complexity while maintaining accuracy.
Contribution
It proposes a new pruning approach based on constrained optimization with a holonomic constraint, avoiding issues of previous methods like ADMM, and provides adaptive control over resource tapering.
Findings
Achieves significant GMAC reduction with minimal accuracy loss on VGG-16.
Reduces GMAC substantially on AlexNet with only 1% accuracy drop.
Uses a direct constrained optimization approach avoiding reliance on weight scales.
Abstract
Neural network pruning is an important step in design process of efficient neural networks for edge devices with limited computational power. Pruning is a form of knowledge transfer from the weights of the original network to a smaller target subnetwork. We propose a new method for compute-constrained structured channel-wise pruning of convolutional neural networks. The method iteratively fine-tunes the network, while gradually tapering the computation resources available to the pruned network via a holonomic constraint in the method of Lagrangian multipliers framework. An explicit and adaptive automatic control over the rate of tapering is provided. The trainable parameters of our pruning method are separate from the weights of the neural network, which allows us to avoid the interference with the neural network solver (e.g. avoid the direct dependence of pruning speed on neural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Neural Networks and Applications · Model Reduction and Neural Networks
MethodsPruning · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · 1x1 Convolution · Convolution · Local Response Normalization · Grouped Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling
