ISQuant: apply squant to the real deployment

Dezan Zhao

arXiv:2407.11037·cs.LG·July 17, 2024

ISQuant: apply squant to the real deployment

Dezan Zhao

PDF

Open Access

TL;DR

This paper introduces ISQuant, a practical and efficient method for deploying 8-bit neural network models, bridging the gap between research and real-world application by reducing complexity and computational overhead.

Contribution

The paper proposes ISQuant, a deployment-oriented quantization method that is fast, simple, requires fewer parameters, and does not need training data, improving real-world neural network deployment.

Findings

01

ISQuant is fast and easy to use for 8-bit models.

02

It requires fewer parameters and less computation.

03

Experimental results show acceptable performance.

Abstract

The model quantization technique of deep neural networks has garnered significant attention and has proven to be highly useful in compressing model size, reducing computation costs, and accelerating inference. Many researchers employ fake quantization for analyzing or training the quantization process. However, fake quantization is not the final form for deployment, and there exists a gap between the academic setting and real-world deployment. Additionally, the inclusion of additional computation with scale and zero-point makes deployment a challenging task. In this study, we first analyze why the combination of quantization and dequantization is used to train the model and draw the conclusion that fake quantization research is reasonable due to the disappearance of weight gradients and the ability to approximate between fake and real quantization. Secondly, we propose ISQuant as a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSecurity and Verification in Computing