Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit   Neural Networks

Ruihao Gong; Xianglong Liu; Shenghu Jiang; Tianxiang Li; Peng Hu,; Jiazhen Lin; Fengwei Yu; Junjie Yan

arXiv:1908.05033·cs.CV·August 15, 2019·60 cites

Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks

Ruihao Gong, Xianglong Liu, Shenghu Jiang, Tianxiang Li, Peng Hu,, Jiazhen Lin, Fengwei Yu, Junjie Yan

PDF

Open Access 2 Repos

TL;DR

This paper introduces Differentiable Soft Quantization (DSQ), a novel method that enables stable training of low-bit neural networks by bridging the gap with full-precision models, leading to improved accuracy and deployment efficiency.

Contribution

The paper proposes DSQ, a differentiable quantization technique that automatically adapts during training to better approximate standard low-bit quantization.

Findings

01

DSQ outperforms existing quantization methods on various network architectures.

02

Training with DSQ results in higher accuracy for low-bit neural networks.

03

Efficient deployment of 2-4 bit DSQ on ARM devices achieves up to 1.7× speedup.

Abstract

Hardware-friendly network quantization (e.g., binary/uniform quantization) can efficiently accelerate the inference and meanwhile reduce memory consumption of the deep neural networks, which is crucial for model deployment on resource-limited devices like mobile phones. However, due to the discreteness of low-bit quantization, existing quantization methods often face the unstable training process and severe performance degradation. To address this problem, in this paper we propose Differentiable Soft Quantization (DSQ) to bridge the gap between the full-precision and low-bit networks. DSQ can automatically evolve during training to gradually approximate the standard quantization. Owing to its differentiable property, DSQ can help pursue the accurate gradients in backward propagation, and reduce the quantization loss in forward process with an appropriate clipping range. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings