StatQAT: Statistical Quantizer Optimization for Deep Networks
Mehmet Aktukmak, Daniel Huang, Ke Ding

TL;DR
This paper introduces a statistical error analysis framework and novel quantizers for deep neural network quantization, improving accuracy and stability across various data distributions and formats.
Contribution
It presents a new theoretical analysis and iterative quantizers tailored for arbitrary and Gaussian-like data distributions, enhancing low-precision neural network training.
Findings
Improved accuracy in quantized neural networks.
Enhanced stability during training with new quantizers.
Effective quantization across multiple data formats.
Abstract
Quantization is essential for reducing the computational cost and memory usage of deep neural networks, enabling efficient inference on low-precision hardware. Despite the growing adoption of uniform and floating-point quantization schemes, selecting optimal quantization parameters remains a key challenge, particularly for diverse data distributions encountered during training and inference. This work presents a novel statistical error analysis framework for uniform and floating-point quantization, providing theoretical insight into error behavior across quantization configurations. Building on this analysis, we propose iterative quantizers designed for arbitrary data distributions and analytic quantizers tailored for Gaussian-like weight distributions. These methods enable efficient, low-error quantization suitable for both activations and weights. We incorporate our quantizers into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
