OSAQ: Outlier Self-Absorption for Accurate Low-bit LLM Quantization
Zhikai Li, Zhen Dong, Xuewen Liu, Jing Zhang, Qingyi Gu

TL;DR
OSAQ introduces a novel low-rank Hessian-based method for suppressing weight outliers in LLMs, significantly improving low-bit quantization performance without additional inference costs.
Contribution
The paper proposes a second-order null space approach for outlier suppression in LLM quantization, enabling efficient, additive weight transformations with no extra inference overhead.
Findings
OSAQ achieves over 40% lower perplexity in 2-bit quantization with GPTQ.
The method effectively suppresses weight outliers and improves quantization accuracy.
It operates efficiently using a closed-form solution without iterative training.
Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities. However, their massive parameter scale leads to significant resource consumption and latency during inference. Post-training weight-only quantization offers a promising solution by reducing model size and accelerating token generation through alleviating the memory-bound issue. Nevertheless, the presence of inherent systematic outliers in weights continues to be a major obstacle. While existing methods, such as scaling and rotation, attempt to address this issue, the performance remains unsatisfactory. In this paper, we propose Outlier Self-Absorption Quantization (OSAQ), which performs additive weight suppression guided by the second-order low-rank property for low-bit weight-only quantization of LLMs. Specifically, we observe that the Hessian exhibits low-rank consistency across different inputs, with certain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
