UNIQ: Uniform Noise Injection for Non-Uniform Quantization of Neural Networks
Chaim Baskin, Eli Schwartz, Evgenii Zheltonozhskii, Natan Liss, Raja, Giryes, Alex M. Bronstein, Avi Mendelson

TL;DR
This paper introduces UNIQ, a novel neural network quantization method that mimics non-uniform $k$-quantile quantization, improving efficiency especially in low computational budget scenarios, and offers a new direction for quantization techniques.
Contribution
The paper proposes UNIQ, a new approach to neural network quantization that adapts to parameter distributions, providing an alternative to uniform quantization and demonstrating advantages in low BOPS regimes.
Findings
Advantages in low computational budget regimes.
Comparable performance with non-uniform quantization.
Sets basis for new neural network quantization methods.
Abstract
We present a novel method for neural network quantization that emulates a non-uniform -quantile quantizer, which adapts to the distribution of the quantized parameters. Our approach provides a novel alternative to the existing uniform quantization techniques for neural networks. We suggest to compare the results as a function of the bit-operations (BOPS) performed, assuming a look-up table availability for the non-uniform case. In this setup, we show the advantages of our strategy in the low computational budget regime. While the proposed solution is harder to implement in hardware, we believe it sets a basis for new alternatives to neural networks quantization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
