TL;DR
This paper introduces PROFIT, a new training approach for sub-4-bit MobileNet models that addresses activation instability caused by weight quantization, enabling high-accuracy low-precision mobile networks.
Contribution
The paper proposes PROFIT, a novel progressive freezing training method, along with DuQ and negative padding techniques, to improve sub-4-bit MobileNet quantization.
Findings
4-bit MobileNets achieve within 1.48% accuracy of full precision.
The method outperforms previous techniques in 3-bit MobileNet-v3 quantization.
High accuracy retention with aggressive quantization levels.
Abstract
4-bit and lower precision mobile models are required due to the ever-increasing demand for better energy efficiency in mobile devices. In this work, we report that the activation instability induced by weight quantization (AIWQ) is the key obstacle to sub-4-bit quantization of mobile networks. To alleviate the AIWQ problem, we propose a novel training method called PROgressive-Freezing Iterative Training (PROFIT), which attempts to freeze layers whose weights are affected by the instability problem stronger than the other layers. We also propose a differentiable and unified quantization method (DuQ) and a negative padding idea to support asymmetric activation functions such as h-swish. We evaluate the proposed methods by quantizing MobileNet-v1, v2, and v3 on ImageNet and report that 4-bit quantization offers comparable (within 1.48 % top-1 accuracy) accuracy to full precision baseline.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
