Soft Quantization: Model Compression Via Weight Coupling

Daniel T. Bernstein; Luca Di Carlo; David Schwab

arXiv:2601.21219·cs.LG·January 30, 2026

Soft Quantization: Model Compression Via Weight Coupling

Daniel T. Bernstein, Luca Di Carlo, David Schwab

PDF

Open Access

TL;DR

This paper introduces a novel soft quantization method that uses weight coupling during training to efficiently discretize neural network weights, improving model compression and offering insights into the compression-generalization trade-off.

Contribution

It proposes a new soft quantization technique leveraging weight coupling, outperforming traditional post-training quantization methods in certain settings.

Findings

01

Outperforms histogram-equalized post-training quantization on ResNet-20/CIFAR-10

02

Induces rapid weight discretization with only two hyperparameters

03

Provides a new approach for model compression and analysis of generalization

Abstract

We show that introducing short-range attractive couplings between the weights of a neural network during training provides a novel avenue for model quantization. These couplings rapidly induce the discretization of a model's weight distribution, and they do so in a mixed-precision manner despite only relying on two additional hyperparameters. We demonstrate that, within an appropriate range of hyperparameters, our "soft quantization'' scheme outperforms histogram-equalized post-training quantization on ResNet-20/CIFAR-10. Soft quantization provides both a new pipeline for the flexible compression of machine learning models and a new tool for investigating the trade-off between compression and generalization in high-dimensional loss landscapes.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques · Advanced Neural Network Applications · Stochastic Gradient Optimization Techniques