Exploiting the Partly Scratch-off Lottery Ticket for Quantization-Aware Training
Yunshan Zhong, Gongrui Nan, Yuxin Zhang, Fei Chao, Rongrong Ji

TL;DR
This paper introduces a method called lottery ticket scratcher (LTS) that identifies and freezes weights early in quantization-aware training, reducing computation while maintaining or improving model performance.
Contribution
The paper proposes a novel heuristic, LTS, to effectively freeze weights during QAT, significantly reducing training complexity without sacrificing accuracy.
Findings
LTS eliminates 50%-70% of weight updates.
LTS reduces 25%-35% of backward pass FLOPs.
LTS improves 2-bit MobileNetV2 accuracy by 5.05%.
Abstract
Quantization-aware training (QAT) receives extensive popularity as it well retains the performance of quantized networks. In QAT, the contemporary experience is that all quantized weights are updated for an entire training process. In this paper, this experience is challenged based on an interesting phenomenon we observed. Specifically, a large portion of quantized weights reaches the optimal quantization level after a few training epochs, which we refer to as the partly scratch-off lottery ticket. This straightforward-yet-valuable observation naturally inspires us to zero out gradient calculations of these weights in the remaining training period to avoid meaningless updating. To effectively find the ticket, we develop a heuristic method, dubbed lottery ticket scratcher (LTS), which freezes a weight once the distance between the full-precision one and its quantization level is smaller…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Image Enhancement Techniques
MethodsDepthwise Convolution · Pointwise Convolution · Average Pooling · Depthwise Separable Convolution · Batch Normalization · Inverted Residual Block · Convolution · 1x1 Convolution
