EPTQ: Enhanced Post-Training Quantization via Hessian-guided   Network-wise Optimization

Ofir Gordon; Elad Cohen; Hai Victor Habi; Arnon Netzer

arXiv:2309.11531·cs.CV·September 27, 2024·1 cites

EPTQ: Enhanced Post-Training Quantization via Hessian-guided Network-wise Optimization

Ofir Gordon, Elad Cohen, Hai Victor Habi, Arnon Netzer

PDF

Open Access 1 Repo

TL;DR

This paper introduces EPTQ, a novel post-training quantization method that uses Hessian-guided network-wise optimization to improve neural network deployment efficiency on edge devices, especially with small datasets.

Contribution

EPTQ employs a Hessian-based, label-free approach for network-wise optimization, considering cross-layer dependencies, and enhances weight quantization parameter selection for better accuracy.

Findings

01

Achieves state-of-the-art results on ImageNet, COCO, and Pascal-VOC datasets.

02

Effectively guides layer sensitivity focus using Hessian upper bounds.

03

Improves quantization performance with small representative datasets.

Abstract

Quantization is a key method for deploying deep neural networks on edge devices with limited memory and computation resources. Recent improvements in Post-Training Quantization (PTQ) methods were achieved by an additional local optimization process for learning the weight quantization rounding policy. However, a gap exists when employing network-wise optimization with small representative datasets. In this paper, we propose a new method for enhanced PTQ (EPTQ) that employs a network-wise quantization optimization process, which benefits from considering cross-layer dependencies during optimization. EPTQ enables network-wise optimization with a small representative dataset using a novel sample-layer attention score based on a label-free Hessian matrix upper bound. The label-free approach makes our method suitable for the PTQ scheme. We give a theoretical analysis for the said bound and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sony/model_optimization
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications

MethodsKnowledge Distillation