OSAQ: Outlier Self-Absorption for Accurate Low-bit LLM Quantization

Zhikai Li; Zhen Dong; Xuewen Liu; Jing Zhang; Qingyi Gu

arXiv:2605.04738·cs.LG·May 12, 2026

OSAQ: Outlier Self-Absorption for Accurate Low-bit LLM Quantization

Zhikai Li, Zhen Dong, Xuewen Liu, Jing Zhang, Qingyi Gu

PDF

TL;DR

OSAQ introduces a novel low-rank Hessian-based method for suppressing weight outliers in LLMs, significantly improving low-bit quantization performance without additional inference costs.

Contribution

The paper proposes a second-order null space approach for outlier suppression in LLM quantization, enabling efficient, additive weight transformations with no extra inference overhead.

Findings

01

OSAQ achieves over 40% lower perplexity in 2-bit quantization with GPTQ.

02

The method effectively suppresses weight outliers and improves quantization accuracy.

03

It operates efficiently using a closed-form solution without iterative training.

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities. However, their massive parameter scale leads to significant resource consumption and latency during inference. Post-training weight-only quantization offers a promising solution by reducing model size and accelerating token generation through alleviating the memory-bound issue. Nevertheless, the presence of inherent systematic outliers in weights continues to be a major obstacle. While existing methods, such as scaling and rotation, attempt to address this issue, the performance remains unsatisfactory. In this paper, we propose Outlier Self-Absorption Quantization (OSAQ), which performs additive weight suppression guided by the second-order low-rank property for low-bit weight-only quantization of LLMs. Specifically, we observe that the Hessian exhibits low-rank consistency across different inputs, with certain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.