TL;DR
This paper introduces a novel differentiable quantization function for neural networks, enabling lossless, end-to-end low-bit quantization applicable to weights and activations, leading to improved performance on classification and detection tasks.
Contribution
It presents a new perspective of neural network quantization as a differentiable function, allowing simple, uniform, and lossless end-to-end training, outperforming existing methods.
Findings
Outperforms state-of-the-art quantization methods on image classification.
Effective for both weights and activations in neural networks.
Enables simple, uniform, and lossless quantization process.
Abstract
Although deep neural networks are highly effective, their high computational and memory costs severely challenge their applications on portable devices. As a consequence, low-bit quantization, which converts a full-precision neural network into a low-bitwidth integer version, has been an active and promising research topic. Existing methods formulate the low-bit quantization of networks as an approximation or optimization problem. Approximation-based methods confront the gradient mismatch problem, while optimization-based methods are only suitable for quantizing weights and could introduce high computational cost in the training stage. In this paper, we propose a novel perspective of interpreting and implementing neural network quantization by formulating low-bit quantization as a differentiable non-linear function (termed quantization function). The proposed quantization function can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
