UPSCALE: Unconstrained Channel Pruning
Alvin Wan, Hanxiang Hao, Kaushik Patnaik, Yueyang Xu, Omer Hadad,, David G\"uera, Zhile Ren, Qi Shan

TL;DR
UPSCALE introduces a novel channel reordering method that reduces inference latency and improves accuracy in pruned neural networks by removing constraints and optimizing channel order at export time.
Contribution
It proposes a generic algorithm to prune models without constraints, enhancing accuracy and speed, applicable to any pruning pattern.
Findings
Increases ImageNet accuracy by 2.1 points on average
Improves inference speed by up to 2x
Beneficial across multiple architectures like DenseNet, EfficientNetV2, ResNet
Abstract
As neural networks grow in size and complexity, inference speeds decline. To combat this, one of the most effective compression techniques -- channel pruning -- removes channels from weights. However, for multi-branch segments of a model, channel removal can introduce inference-time memory copies. In turn, these copies increase inference latency -- so much so that the pruned model can be slower than the unpruned model. As a workaround, pruners conventionally constrain certain channels to be pruned together. This fully eliminates memory copies but, as we show, significantly impairs accuracy. We now have a dilemma: Remove constraints but increase latency, or add constraints and impair accuracy. In response, our insight is to reorder channels at export time, (1) reducing latency by reducing memory copies and (2) improving accuracy by removing constraints. Using this insight, we design a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
MethodsDepthwise Convolution · Pointwise Convolution · Depthwise Separable Convolution · Concatenated Skip Connection · Batch Normalization · 1x1 Convolution · Dense Block · Max Pooling · Residual Connection · Residual Block
