Make $\ell_1$ Regularization Effective in Training Sparse CNN
Juncai He, Xiaodong Jia, Jinchao Xu, Lian Zhang, Liang Zhao

TL;DR
This paper demonstrates how to effectively apply $ abla$ regularization to train sparse CNNs by replacing SGD with a regularized dual averaging method, achieving high sparsity without accuracy loss.
Contribution
It introduces a modified training algorithm based on RDA for $ abla$ regularization in CNNs, overcoming previous incompatibility issues.
Findings
Achieves 95% sparsity in ResNet18 on CIFAR-10.
Outperforms existing weight pruning methods in sparsity.
Maintains high accuracy with sparse models.
Abstract
Compressed Sensing using regularization is among the most powerful and popular sparsification technique in many applications, but why has it not been used to obtain sparse deep learning model such as convolutional neural network (CNN)? This paper is aimed to provide an answer to this question and to show how to make it work. We first demonstrate that the commonly used stochastic gradient decent (SGD) and variants training algorithm is not an appropriate match with regularization and then replace it with a different training algorithm based on a regularized dual averaging (RDA) method. RDA was originally designed specifically for convex problem, but with new theoretical insight and algorithmic modifications (using proper initialization and adaptivity), we have made it an effective match with regularization to achieve a state-of-the-art sparsity for CNN compared…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPruning
