Fast Adjustable Threshold For Uniform Neural Network Quantization (Winning solution of LPIRC-II)
Alexander Goncharenko, Andrey Denisov, Sergey Alyamkin, Evgeny, Terentev

TL;DR
This paper introduces a fast, adjustable threshold method for neural network quantization that simplifies the process, reduces training time, and maintains high accuracy, making it suitable for mobile device deployment.
Contribution
It proposes a novel trained scale factor approach for quantization thresholds, enabling rapid fine-tuning and minimal accuracy loss in quantized neural networks.
Findings
Achieved 74.8% accuracy on MNAS with quantization, close to the original 75.3%.
Reduced fine-tuning epochs to 8, significantly speeding up the quantization process.
Provided an open-source implementation for practical use.
Abstract
Neural network quantization procedure is the necessary step for porting of neural networks to mobile devices. Quantization allows accelerating the inference, reducing memory consumption and model size. It can be performed without fine-tuning using calibration procedure (calculation of parameters necessary for quantization), or it is possible to train the network with quantization from scratch. Training with quantization from scratch on the labeled data is rather long and resource-consuming procedure. Quantization of network without fine-tuning leads to accuracy drop because of outliers which appear during the calibration. In this article we suggest to simplify the quantization procedure significantly by introducing the trained scale factors for quantization thresholds. It allows speeding up the process of quantization with fine-tuning up to 8 epochs as well as reducing the requirements…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Medical Image Segmentation Techniques
