EPTQ: Enhanced Post-Training Quantization via Hessian-guided Network-wise Optimization
Ofir Gordon, Elad Cohen, Hai Victor Habi, Arnon Netzer

TL;DR
This paper introduces EPTQ, a novel post-training quantization method that uses Hessian-guided network-wise optimization to improve neural network deployment efficiency on edge devices, especially with small datasets.
Contribution
EPTQ employs a Hessian-based, label-free approach for network-wise optimization, considering cross-layer dependencies, and enhances weight quantization parameter selection for better accuracy.
Findings
Achieves state-of-the-art results on ImageNet, COCO, and Pascal-VOC datasets.
Effectively guides layer sensitivity focus using Hessian upper bounds.
Improves quantization performance with small representative datasets.
Abstract
Quantization is a key method for deploying deep neural networks on edge devices with limited memory and computation resources. Recent improvements in Post-Training Quantization (PTQ) methods were achieved by an additional local optimization process for learning the weight quantization rounding policy. However, a gap exists when employing network-wise optimization with small representative datasets. In this paper, we propose a new method for enhanced PTQ (EPTQ) that employs a network-wise quantization optimization process, which benefits from considering cross-layer dependencies during optimization. EPTQ enables network-wise optimization with a small representative dataset using a novel sample-layer attention score based on a label-free Hessian matrix upper bound. The label-free approach makes our method suitable for the PTQ scheme. We give a theoretical analysis for the said bound and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
MethodsKnowledge Distillation
