A probabilistic framework for dynamic quantization
Gabriele Santini, Francesco Paissan, Elisabetta Farella

TL;DR
This paper introduces a probabilistic framework for dynamic, input-adaptive neural network quantization that maintains high accuracy with minimal computational overhead, improving upon standard methods.
Contribution
It presents a novel probabilistic approach enabling efficient, input-dependent adjustment of quantization parameters in neural networks.
Findings
Negligible performance loss on vision tasks
Superior tradeoff between accuracy and computational cost
Effective adaptive quantization without significant memory overhead
Abstract
We propose a probabilistic framework for dynamic quantization of neural networks that allows for a computationally efficient input-adaptive rescaling of the quantization parameters. Our framework applies a probabilistic model to the network's pre-activations through a lightweight surrogate, enabling the adaptive adjustment of the quantization parameters on a per-input basis without significant memory overhead. We validate our approach on a set of popular computer vision tasks and models, observing only a negligible loss in performance. Our method strikes the best performance and computational overhead tradeoff compared to standard quantization strategies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Data Compression Techniques · Adversarial Robustness in Machine Learning
MethodsSparse Evolutionary Training
