OstQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting
Xing Hu, Yuan Cheng, Dawei Yang, Zukang Xu, Zhihang Yuan, Jiangyong, Yu, Chen Xu, Zhe Jiang, Sifan Zhou

TL;DR
OSTQuant introduces a novel learnable transformation approach for LLM post-training quantization, effectively optimizing data distribution in the quantization space to improve model accuracy and outperform existing methods.
Contribution
The paper proposes OSTQuant, a method using orthogonal and scaling transformations guided by QSUR to enhance quantization of LLMs, with a new loss function for better optimization.
Findings
Achieves 99.5% accuracy retention in W4-only setting.
Reduces performance gap by 32% in W4A4KV4 configuration.
Outperforms existing methods on various LLM benchmarks.
Abstract
Post-training quantization (PTQ) has emerged as a widely adopted technique for compressing and accelerating Large Language Models (LLMs). The major challenge in LLM quantization is that uneven and heavy-tailed data distributions can expand the quantization range, thereby reducing bit precision for most values. Recent methods attempt to eliminate outliers and balance inter-channel differences by employing linear transformations; however, they remain heuristic and are often overlook optimizing the data distribution across the entire quantization space.In this paper, we introduce Quantization Space Utilization Rate (QSUR), a novel metric that effectively assesses the quantizability of transformed data by measuring the space utilization of the data in the quantization space. We complement QSUR with mathematical derivations that examine the effects and limitations of various transformations,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
