Soft Threshold Weight Reparameterization for Learnable Sparsity
Aditya Kusupati, Vivek Ramanujan, Raghav Somani, Mitchell Wortsman,, Prateek Jain, Sham Kakade, Ali Farhadi

TL;DR
This paper introduces Soft Threshold Reparameterization (STR), a novel method that learns non-uniform sparsity budgets in neural networks, significantly improving accuracy and reducing FLOPs, especially in ultra sparse regimes.
Contribution
STR is a new soft-threshold operator-based approach that learns layer-wise sparsity thresholds, outperforming heuristic methods in neural network pruning.
Findings
Achieves state-of-the-art accuracy for unstructured sparsity in CNNs.
Reduces FLOPs by up to 50% through learned non-uniform sparsity budgets.
Boosts accuracy by up to 10% in ultra sparse (99%) regimes.
Abstract
Sparsity in Deep Neural Networks (DNNs) is studied extensively with the focus of maximizing prediction accuracy given an overall parameter budget. Existing methods rely on uniform or heuristic non-uniform sparsity budgets which have sub-optimal layer-wise parameter allocation resulting in a) lower prediction accuracy or b) higher inference cost (FLOPs). This work proposes Soft Threshold Reparameterization (STR), a novel use of the soft-threshold operator on DNN weights. STR smoothly induces sparsity while learning pruning thresholds thereby obtaining a non-uniform sparsity budget. Our method achieves state-of-the-art accuracy for unstructured sparsity in CNNs (ResNet50 and MobileNetV1 on ImageNet-1K), and, additionally, learns non-uniform budgets that empirically reduce the FLOPs by up to 50%. Notably, STR boosts the accuracy over existing results by up to 10% in the ultra sparse (99%)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Infrastructure Maintenance and Monitoring
MethodsPruning · Depthwise Convolution · Pointwise Convolution · Average Pooling · Global Average Pooling · Depthwise Separable Convolution · 1x1 Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Dense Connections
