A Data-Free Analytical Quantization Scheme for Deep Learning Models

Ahmed Luqman; Khuzemah Qazi; Murray Patterson; Malik Jahan Khan; Imdadullah Khan

arXiv:2412.07391·cs.CV·September 10, 2025

A Data-Free Analytical Quantization Scheme for Deep Learning Models

Ahmed Luqman, Khuzemah Qazi, Murray Patterson, Malik Jahan Khan, Imdadullah Khan

PDF

Open Access

TL;DR

This paper presents a novel data-free post-training quantization method for deep learning models that reduces model size and computational demands while maintaining accuracy, enabling deployment on resource-limited devices.

Contribution

It introduces a new quantization scheme that finds optimal clipping thresholds and scaling factors with mathematical guarantees, without requiring training data.

Findings

01

Significantly reduces model size and computational requirements.

02

Preserves model accuracy after quantization.

03

Works effectively on real-world datasets.

Abstract

Despite the success of CNN models on a variety of Image classification and segmentation tasks, their extensive computational and storage demands pose considerable challenges for real-world deployment on resource-constrained devices. Quantization is one technique that aims to alleviate these large storage requirements and speed up the inference process by reducing the precision of model parameters to lower-bit representations. In this paper, we introduce a novel post-training quantization method for model weights. Our method finds optimal clipping thresholds and scaling factors along with mathematical guarantees that our method minimizes quantization noise. Empirical results on real-world datasets demonstrate that our quantization scheme significantly reduces model size and computational requirements while preserving model accuracy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings