TL;DR
This paper introduces an integer-only quantization scheme and a co-designed training method that enable efficient, accurate deep learning inference on mobile devices with integer hardware, improving latency and accuracy tradeoffs.
Contribution
It presents a novel quantization scheme and training procedure that together enable end-to-end integer-only inference with minimal accuracy loss.
Findings
Significant accuracy improvements on MobileNets after quantization.
Enhanced on-device inference speed and efficiency.
Validated on ImageNet and COCO datasets with CPU hardware.
Abstract
The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried out using integer-only arithmetic, which can be implemented more efficiently than floating point inference on commonly available integer-only hardware. We also co-design a training procedure to preserve end-to-end model accuracy post quantization. As a result, the proposed quantization scheme improves the tradeoff between accuracy and on-device latency. The improvements are significant even on MobileNets, a model family known for run-time efficiency, and are demonstrated in ImageNet classification and COCO detection on popular CPUs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methods24/7 QuickBooks Enterprise Support Number | Fast Solutions 💼 · How Can I Contact QuickBooks Premier Support Help Team?
