Soft Weight-Sharing for Neural Network Compression
Karen Ullrich, Edward Meeds, Max Welling

TL;DR
This paper introduces a simple soft weight-sharing method for neural network compression that combines quantization and pruning in a single training process, inspired by the MDL principle.
Contribution
It presents a novel, unified approach to neural network compression using soft weight-sharing, simplifying the process compared to existing multi-step pipelines.
Findings
Achieves competitive compression rates with fewer training steps
Simultaneously performs quantization and pruning
Provides insights into the relation between compression and MDL
Abstract
The success of deep learning in numerous application domains created the de- sire to run and train them on mobile devices. This however, conflicts with their computationally, memory and energy intense nature, leading to a growing interest in compression. Recent work by Han et al. (2015a) propose a pipeline that involves retraining, pruning and quantization of neural network weights, obtaining state-of-the-art compression rates. In this paper, we show that competitive compression rates can be achieved by using a version of soft weight-sharing (Nowlan & Hinton, 1992). Our method achieves both quantization and pruning in one simple (re-)training procedure. This point of view also exposes the relation between compression and the minimum description length (MDL) principle.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Anomaly Detection Techniques and Applications · Generative Adversarial Networks and Image Synthesis
MethodsPruning
