OstQuant: Refining Large Language Model Quantization with Orthogonal and   Scaling Transformations for Better Distribution Fitting

Xing Hu; Yuan Cheng; Dawei Yang; Zukang Xu; Zhihang Yuan; Jiangyong; Yu; Chen Xu; Zhe Jiang; Sifan Zhou

arXiv:2501.13987·cs.LG·January 27, 2025

OstQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting

Xing Hu, Yuan Cheng, Dawei Yang, Zukang Xu, Zhihang Yuan, Jiangyong, Yu, Chen Xu, Zhe Jiang, Sifan Zhou

PDF

Open Access 1 Repo

TL;DR

OSTQuant introduces a novel learnable transformation approach for LLM post-training quantization, effectively optimizing data distribution in the quantization space to improve model accuracy and outperform existing methods.

Contribution

The paper proposes OSTQuant, a method using orthogonal and scaling transformations guided by QSUR to enhance quantization of LLMs, with a new loss function for better optimization.

Findings

01

Achieves 99.5% accuracy retention in W4-only setting.

02

Reduces performance gap by 32% in W4A4KV4 configuration.

03

Outperforms existing methods on various LLM benchmarks.

Abstract

Post-training quantization (PTQ) has emerged as a widely adopted technique for compressing and accelerating Large Language Models (LLMs). The major challenge in LLM quantization is that uneven and heavy-tailed data distributions can expand the quantization range, thereby reducing bit precision for most values. Recent methods attempt to eliminate outliers and balance inter-channel differences by employing linear transformations; however, they remain heuristic and are often overlook optimizing the data distribution across the entire quantization space.In this paper, we introduce Quantization Space Utilization Rate (QSUR), a novel metric that effectively assesses the quantizability of transformed data by measuring the space utilization of the data in the quantization space. We complement QSUR with mathematical derivations that examine the effects and limitations of various transformations,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

brotherhappy/ostquant
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis