Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks
Ruihao Gong, Xianglong Liu, Shenghu Jiang, Tianxiang Li, Peng Hu,, Jiazhen Lin, Fengwei Yu, Junjie Yan

TL;DR
This paper introduces Differentiable Soft Quantization (DSQ), a novel method that enables stable training of low-bit neural networks by bridging the gap with full-precision models, leading to improved accuracy and deployment efficiency.
Contribution
The paper proposes DSQ, a differentiable quantization technique that automatically adapts during training to better approximate standard low-bit quantization.
Findings
DSQ outperforms existing quantization methods on various network architectures.
Training with DSQ results in higher accuracy for low-bit neural networks.
Efficient deployment of 2-4 bit DSQ on ARM devices achieves up to 1.7× speedup.
Abstract
Hardware-friendly network quantization (e.g., binary/uniform quantization) can efficiently accelerate the inference and meanwhile reduce memory consumption of the deep neural networks, which is crucial for model deployment on resource-limited devices like mobile phones. However, due to the discreteness of low-bit quantization, existing quantization methods often face the unstable training process and severe performance degradation. To address this problem, in this paper we propose Differentiable Soft Quantization (DSQ) to bridge the gap between the full-precision and low-bit networks. DSQ can automatically evolve during training to gradually approximate the standard quantization. Owing to its differentiable property, DSQ can help pursue the accurate gradients in backward propagation, and reduce the quantization loss in forward process with an appropriate clipping range. Extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
