A Data-Free Analytical Quantization Scheme for Deep Learning Models
Ahmed Luqman, Khuzemah Qazi, Murray Patterson, Malik Jahan Khan, Imdadullah Khan

TL;DR
This paper presents a novel data-free post-training quantization method for deep learning models that reduces model size and computational demands while maintaining accuracy, enabling deployment on resource-limited devices.
Contribution
It introduces a new quantization scheme that finds optimal clipping thresholds and scaling factors with mathematical guarantees, without requiring training data.
Findings
Significantly reduces model size and computational requirements.
Preserves model accuracy after quantization.
Works effectively on real-world datasets.
Abstract
Despite the success of CNN models on a variety of Image classification and segmentation tasks, their extensive computational and storage demands pose considerable challenges for real-world deployment on resource-constrained devices. Quantization is one technique that aims to alleviate these large storage requirements and speed up the inference process by reducing the precision of model parameters to lower-bit representations. In this paper, we introduce a novel post-training quantization method for model weights. Our method finds optimal clipping thresholds and scaling factors along with mathematical guarantees that our method minimizes quantization noise. Empirical results on real-world datasets demonstrate that our quantization scheme significantly reduces model size and computational requirements while preserving model accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
