NUPES : Non-Uniform Post-Training Quantization via Power Exponent Search

Edouard Yvinec; Arnaud Dapogny; Kevin Bailly

arXiv:2308.05600·cs.LG·August 11, 2023·1 cites

NUPES : Non-Uniform Post-Training Quantization via Power Exponent Search

Edouard Yvinec, Arnaud Dapogny, Kevin Bailly

PDF

Open Access

TL;DR

NUPES introduces a non-uniform post-training quantization method for deep neural networks and large language models, optimizing power functions to improve compression and inference efficiency while maintaining accuracy.

Contribution

It proposes a novel approach to optimize the quantization operator itself using power exponent search, enabling better handling of weight distributions and outliers.

Findings

01

Achieves state-of-the-art compression rates.

02

Compatible with integer-only low-bit inference.

03

Effective in both data-free and data-driven settings.

Abstract

Deep neural network (DNN) deployment has been confined to larger hardware devices due to their expensive computational requirements. This challenge has recently reached another scale with the emergence of large language models (LLMs). In order to reduce both their memory footprint and latency, a promising technique is quantization. It consists in converting floating point representations to low bit-width fixed point representations, usually by assuming a uniform mapping onto a regular grid. This process, referred to in the literature as uniform quantization, may however be ill-suited as most DNN weights and activations follow a bell-shaped distribution. This is even worse on LLMs whose weight distributions are known to exhibit large, high impact, outlier values. In this work, we propose an improvement over the most commonly adopted way to tackle this limitation in deep learning models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Machine Learning and Data Classification