TL;DR
This paper introduces Selective Weight Decay (SWD), a novel method for continuous neural network pruning during training, grounded in Lagrangian smoothing, improving efficiency and scalability across various datasets and architectures.
Contribution
The paper proposes SWD, a theoretically grounded, versatile pruning technique that enables continuous, efficient sparsity induction during training, outperforming existing methods in performance-to-parameters ratio.
Findings
SWD achieves better performance-to-parameters ratio than state-of-the-art methods.
SWD is applicable across multiple tasks, networks, and pruning structures.
SWD demonstrates superior results on CIFAR-10, Cora, and ImageNet datasets.
Abstract
Introduced in the late 1980s for generalization purposes, pruning has now become a staple for compressing deep neural networks. Despite many innovations in recent decades, pruning approaches still face core issues that hinder their performance or scalability. Drawing inspiration from early work in the field, and especially the use of weight decay to achieve sparsity, we introduce Selective Weight Decay (SWD), which carries out efficient, continuous pruning throughout training. Our approach, theoretically grounded on Lagrangian smoothing, is versatile and can be applied to multiple tasks, networks, and pruning structures. We show that SWD compares favorably to state-of-the-art approaches, in terms of performance-to-parameters ratio, on the CIFAR-10, Cora, and ImageNet ILSVRC2012 datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPruning · Weight Decay
