MagR: Weight Magnitude Reduction for Enhancing Post-Training   Quantization

Aozhong Zhang; Naigang Wang; Yanxia Deng; Xin Li; Zi Yang; Penghang; Yin

arXiv:2406.00800·cs.LG·October 18, 2024

MagR: Weight Magnitude Reduction for Enhancing Post-Training Quantization

Aozhong Zhang, Naigang Wang, Yanxia Deng, Xin Li, Zi Yang, Penghang, Yin

PDF

Open Access 1 Repo 1 Video

TL;DR

MagR is a simple, efficient preprocessing technique that reduces weight magnitudes to improve post-training quantization, achieving state-of-the-art results without adding inference overhead.

Contribution

We introduce MagR, a novel non-linear preprocessing method using $ ext{l}_ extinfty$-regularization to enhance post-training quantization performance.

Findings

01

Achieves state-of-the-art quantization performance on Llama models.

02

No additional inference overhead introduced by MagR.

03

Significantly improves perplexity on Wikitext2 for large models.

Abstract

In this paper, we present a simple optimization-based preprocessing technique called Weight Magnitude Reduction (MagR) to improve the performance of post-training quantization. For each linear layer, we adjust the pre-trained floating-point weights by solving an $ℓ_{\infty}$ -regularized optimization problem. This process greatly diminishes the maximum magnitude of the weights and smooths out outliers, while preserving the layer's output. The preprocessed weights are centered more towards zero, which facilitates the subsequent quantization process. To implement MagR, we address the $ℓ_{\infty}$ -regularization by employing an efficient proximal gradient descent algorithm. Unlike existing preprocessing methods that involve linear transformations and subsequent post-processing steps, which can introduce significant overhead at inference time, MagR functions as a non-linear transformation,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aozhongzhang/magr
pytorchOfficial

Videos

MagR: Weight Magnitude Reduction for Enhancing Post-Training Quantization· slideslive

Taxonomy

TopicsAdvanced MRI Techniques and Applications · Cardiovascular Function and Risk Factors

MethodsLLaMA