Soft Quantization: Model Compression Via Weight Coupling
Daniel T. Bernstein, Luca Di Carlo, David Schwab

TL;DR
This paper introduces a novel soft quantization method that uses weight coupling during training to efficiently discretize neural network weights, improving model compression and offering insights into the compression-generalization trade-off.
Contribution
It proposes a new soft quantization technique leveraging weight coupling, outperforming traditional post-training quantization methods in certain settings.
Findings
Outperforms histogram-equalized post-training quantization on ResNet-20/CIFAR-10
Induces rapid weight discretization with only two hyperparameters
Provides a new approach for model compression and analysis of generalization
Abstract
We show that introducing short-range attractive couplings between the weights of a neural network during training provides a novel avenue for model quantization. These couplings rapidly induce the discretization of a model's weight distribution, and they do so in a mixed-precision manner despite only relying on two additional hyperparameters. We demonstrate that, within an appropriate range of hyperparameters, our "soft quantization'' scheme outperforms histogram-equalized post-training quantization on ResNet-20/CIFAR-10. Soft quantization provides both a new pipeline for the flexible compression of machine learning models and a new tool for investigating the trade-off between compression and generalization in high-dimensional loss landscapes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Advanced Neural Network Applications · Stochastic Gradient Optimization Techniques
