ISQuant: apply squant to the real deployment
Dezan Zhao

TL;DR
This paper introduces ISQuant, a practical and efficient method for deploying 8-bit neural network models, bridging the gap between research and real-world application by reducing complexity and computational overhead.
Contribution
The paper proposes ISQuant, a deployment-oriented quantization method that is fast, simple, requires fewer parameters, and does not need training data, improving real-world neural network deployment.
Findings
ISQuant is fast and easy to use for 8-bit models.
It requires fewer parameters and less computation.
Experimental results show acceptable performance.
Abstract
The model quantization technique of deep neural networks has garnered significant attention and has proven to be highly useful in compressing model size, reducing computation costs, and accelerating inference. Many researchers employ fake quantization for analyzing or training the quantization process. However, fake quantization is not the final form for deployment, and there exists a gap between the academic setting and real-world deployment. Additionally, the inclusion of additional computation with scale and zero-point makes deployment a challenging task. In this study, we first analyze why the combination of quantization and dequantization is used to train the model and draw the conclusion that fake quantization research is reasonable due to the disappearance of weight gradients and the ability to approximate between fake and real quantization. Secondly, we propose ISQuant as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSecurity and Verification in Computing
