Rethinking Weight Decay For Efficient Neural Network Pruning

Hugo Tessier; Vincent Gripon; Mathieu L\'eonardon; Matthieu Arzel,; Thomas Hannagan; David Bertrand

arXiv:2011.10520·cs.NE·March 10, 2022

Rethinking Weight Decay For Efficient Neural Network Pruning

Hugo Tessier, Vincent Gripon, Mathieu L\'eonardon, Matthieu Arzel,, Thomas Hannagan, David Bertrand

PDF

1 Repo

TL;DR

This paper introduces Selective Weight Decay (SWD), a novel method for continuous neural network pruning during training, grounded in Lagrangian smoothing, improving efficiency and scalability across various datasets and architectures.

Contribution

The paper proposes SWD, a theoretically grounded, versatile pruning technique that enables continuous, efficient sparsity induction during training, outperforming existing methods in performance-to-parameters ratio.

Findings

01

SWD achieves better performance-to-parameters ratio than state-of-the-art methods.

02

SWD is applicable across multiple tasks, networks, and pruning structures.

03

SWD demonstrates superior results on CIFAR-10, Cora, and ImageNet datasets.

Abstract

Introduced in the late 1980s for generalization purposes, pruning has now become a staple for compressing deep neural networks. Despite many innovations in recent decades, pruning approaches still face core issues that hinder their performance or scalability. Drawing inspiration from early work in the field, and especially the use of weight decay to achieve sparsity, we introduce Selective Weight Decay (SWD), which carries out efficient, continuous pruning throughout training. Our approach, theoretically grounded on Lagrangian smoothing, is versatile and can be applied to multiple tasks, networks, and pruning structures. We show that SWD compares favorably to state-of-the-art approaches, in terms of performance-to-parameters ratio, on the CIFAR-10, Cora, and ImageNet ILSVRC2012 datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HugoTessier-lab/SWD
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsPruning · Weight Decay