NUPES : Non-Uniform Post-Training Quantization via Power Exponent Search
Edouard Yvinec, Arnaud Dapogny, Kevin Bailly

TL;DR
NUPES introduces a non-uniform post-training quantization method for deep neural networks and large language models, optimizing power functions to improve compression and inference efficiency while maintaining accuracy.
Contribution
It proposes a novel approach to optimize the quantization operator itself using power exponent search, enabling better handling of weight distributions and outliers.
Findings
Achieves state-of-the-art compression rates.
Compatible with integer-only low-bit inference.
Effective in both data-free and data-driven settings.
Abstract
Deep neural network (DNN) deployment has been confined to larger hardware devices due to their expensive computational requirements. This challenge has recently reached another scale with the emergence of large language models (LLMs). In order to reduce both their memory footprint and latency, a promising technique is quantization. It consists in converting floating point representations to low bit-width fixed point representations, usually by assuming a uniform mapping onto a regular grid. This process, referred to in the literature as uniform quantization, may however be ill-suited as most DNN weights and activations follow a bell-shaped distribution. This is even worse on LLMs whose weight distributions are known to exhibit large, high impact, outlier values. In this work, we propose an improvement over the most commonly adopted way to tackle this limitation in deep learning models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Machine Learning and Data Classification
