Picking Winning Tickets Before Training by Preserving Gradient Flow
Chaoqi Wang, Guodong Zhang, Roger Grosse

TL;DR
This paper introduces GraSP, a method for pruning neural networks at initialization by preserving gradient flow, enabling resource-efficient training with minimal accuracy loss across multiple datasets and architectures.
Contribution
The paper proposes a novel pruning criterion, Gradient Signal Preservation (GraSP), for early network pruning at initialization, reducing training resources while maintaining accuracy.
Findings
Prunes 80% of weights in VGG-16 on ImageNet with only 1.6% accuracy drop.
Outperforms baseline methods at high sparsity levels.
Effective across CIFAR, Tiny-ImageNet, and ImageNet datasets.
Abstract
Overparameterization has been shown to benefit both the optimization and generalization of neural networks, but large networks are resource hungry at both training and test time. Network pruning can reduce test-time resource requirements, but is typically applied to trained networks and therefore cannot avoid the expensive training process. We aim to prune networks at initialization, thereby saving resources at training time as well. Specifically, we argue that efficient training requires preserving the gradient flow through the network. This leads to a simple but effective pruning criterion we term Gradient Signal Preservation (GraSP). We empirically investigate the effectiveness of the proposed method with extensive experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet and ImageNet, using VGGNet and ResNet architectures. Our method can prune 80% of the weights of a VGG-16 network on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Medical Image Segmentation Techniques
MethodsPruning · Test · Average Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization
