HPTQ: Hardware-Friendly Post Training Quantization
Hai Victor Habi, Reuven Peretz, Elad Cohen, Lior Dikstein, Oranit, Dror, Idit Diamant, Roy H. Jennings, Arnon Netzer

TL;DR
HPTQ introduces a post-training quantization framework that ensures neural network models are optimized for hardware efficiency by supporting uniform, symmetric, and power-of-two thresholds, enabling deployment on edge devices.
Contribution
The paper proposes a novel hardware-friendly post-training quantization method that combines existing techniques to meet specific hardware constraints.
Findings
Achieves competitive accuracy under hardware-friendly constraints
Supports a wide range of network architectures and tasks
Demonstrates effectiveness on classification, detection, segmentation, and pose estimation
Abstract
Neural network quantization enables the deployment of models on edge devices. An essential requirement for their hardware efficiency is that the quantizers are hardware-friendly: uniform, symmetric, and with power-of-two thresholds. To the best of our knowledge, current post-training quantization methods do not support all of these constraints simultaneously. In this work, we introduce a hardware-friendly post training quantization (HPTQ) framework, which addresses this problem by synergistically combining several known quantization methods. We perform a large-scale study on four tasks: classification, object detection, semantic segmentation and pose estimation over a wide variety of network architectures. Our extensive experiments show that competitive results can be obtained under hardware-friendly constraints.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
