Fighting Quantization Bias With Bias
Alexander Finkelstein, Uri Almog, Mark Grobman

TL;DR
This paper identifies a bias-induced shift in activation means caused by quantization in low-precision neural networks, and proposes simple, fast correction methods to restore performance without extensive retraining.
Contribution
It introduces a bias compensation technique for quantized neural networks, effectively correcting activation shifts with minimal data and computation, improving deployment efficiency.
Findings
Bias correction restores network accuracy effectively.
Methods require only small unlabeled data sets.
Performance matches training-based quantization methods.
Abstract
Low-precision representation of deep neural networks (DNNs) is critical for efficient deployment of deep learning application on embedded platforms, however, converting the network to low precision degrades its performance. Crucially, networks that are designed for embedded applications usually suffer from increased degradation since they have less redundancy. This is most evident for the ubiquitous MobileNet architecture which requires a costly quantization-aware training cycle to achieve acceptable performance when quantized to 8-bits. In this paper, we trace the source of the degradation in MobileNets to a shift in the mean activation value. This shift is caused by an inherent bias in the quantization process which builds up across layers, shifting all network statistics away from the learned distribution. We show that this phenomenon happens in other architectures as well. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
