A probabilistic framework for dynamic quantization

Gabriele Santini; Francesco Paissan; Elisabetta Farella

arXiv:2505.10689·cs.LG·May 19, 2025

A probabilistic framework for dynamic quantization

Gabriele Santini, Francesco Paissan, Elisabetta Farella

PDF

Open Access

TL;DR

This paper introduces a probabilistic framework for dynamic, input-adaptive neural network quantization that maintains high accuracy with minimal computational overhead, improving upon standard methods.

Contribution

It presents a novel probabilistic approach enabling efficient, input-dependent adjustment of quantization parameters in neural networks.

Findings

01

Negligible performance loss on vision tasks

02

Superior tradeoff between accuracy and computational cost

03

Effective adaptive quantization without significant memory overhead

Abstract

We propose a probabilistic framework for dynamic quantization of neural networks that allows for a computationally efficient input-adaptive rescaling of the quantization parameters. Our framework applies a probabilistic model to the network's pre-activations through a lightweight surrogate, enabling the adaptive adjustment of the quantization parameters on a per-input basis without significant memory overhead. We validate our approach on a set of popular computer vision tasks and models, observing only a negligible loss in performance. Our method strikes the best performance and computational overhead tradeoff compared to standard quantization strategies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Data Compression Techniques · Adversarial Robustness in Machine Learning

MethodsSparse Evolutionary Training