A Generalization of Continuous Relaxation in Structured Pruning
Brad Larson, Bishal Upadhyaya, Luke McDermott, Siddha Ganju

TL;DR
This paper introduces a generalized structured pruning method that achieves high sparsity and FLOPs reduction in neural networks without accuracy loss, enabling efficient GPU execution on standard hardware.
Contribution
It presents a novel generalization of structured pruning algorithms that allows for stable convergence at high sparsity levels, matching or surpassing state-of-the-art results.
Findings
Achieves up to 93% sparsity and 95% FLOPs reduction without accuracy loss.
Demonstrates efficient GPU execution without sparse matrix operations.
Validates on CIFAR-10, ImageNet, and CityScapes with ResNet and U-NET.
Abstract
Deep learning harnesses massive parallel floating-point processing to train and evaluate large neural networks. Trends indicate that deeper and larger neural networks with an increasing number of parameters achieve higher accuracy than smaller neural networks. This performance improvement, which often requires heavy compute for both training and evaluation, eventually needs to translate well to resource-constrained hardware for practical value. Structured pruning asserts that while large networks enable us to find solutions to complex computer vision problems, a smaller, computationally efficient sub-network can be derived from the large neural network that retains model accuracy but significantly improves computational efficiency. We generalize structured pruning with algorithms for network augmentation, pruning, sub-network collapse and removal. In addition, we demonstrate efficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Average Pooling · 1x1 Convolution · Concatenated Skip Connection · Global Average Pooling · Residual Connection · Residual Block · Batch Normalization · Kaiming Initialization · Convolution
