What to Prune and What Not to Prune at Initialization
Maham Haroon

TL;DR
This paper introduces two novel initialization pruning methods, K-starts and dissipating gradients, that achieve high sparsity in neural networks without retraining or task-specific tuning.
Contribution
It presents new initialization pruning techniques that outperform existing methods in achieving sparsity while preserving network performance.
Findings
Combination of methods outperforms individual approaches and random dropout.
Methods do not require task-specific knowledge or retraining.
Achieves high sparsity with maintained performance.
Abstract
Post-training dropout based approaches achieve high sparsity and are well established means of deciphering problems relating to computational cost and overfitting in Neural Network architectures. Contrastingly, pruning at initialization is still far behind. Initialization pruning is more efficacious when it comes to scaling computation cost of the network. Furthermore, it handles overfitting just as well as post training dropout. In approbation of the above reasons, the paper presents two approaches to prune at initialization. The goal is to achieve higher sparsity while preserving performance. 1) K-starts, begins with k random p-sparse matrices at initialization. In the first couple of epochs the network then determines the "fittest" of these p-sparse matrices in an attempt to find the "lottery ticket" p-sparse network. The approach is adopted from how evolutionary algorithms find…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Face and Expression Recognition · Evolutionary Algorithms and Applications
MethodsPruning · Dropout
