D$^2$Quant: Accurate Low-bit Post-Training Weight Quantization for LLMs

Xianglong Yan; ChengZhu Bao; Zhiteng Li; Tianao Zhang; Shaoqiu Zhang; Ruobing Xie; Samm Sun; and Yulun Zhang

arXiv:2602.02546·cs.LG·February 9, 2026

D$^2$Quant: Accurate Low-bit Post-Training Weight Quantization for LLMs

Xianglong Yan, ChengZhu Bao, Zhiteng Li, Tianao Zhang, Shaoqiu Zhang, Ruobing Xie, Samm Sun, and Yulun Zhang

PDF

Open Access

TL;DR

D$^2$Quant introduces a novel weight-only post-training quantization method for large language models, significantly improving accuracy at sub-4-bit precision by addressing quantization bottlenecks and activation deviations.

Contribution

It proposes Dual-Scale Quantizer and Deviation-Aware Correction to enhance low-bit PTQ accuracy without extra bit-width or hardware changes.

Findings

01

Achieves superior accuracy at sub-4-bit precision across multiple LLMs.

02

Effectively mitigates activation distribution shifts during quantization.

03

Provides a practical framework with open-source code for efficient LLM deployment.

Abstract

Large language models (LLMs) deliver strong performance, but their high compute and memory costs make deployment difficult in resource-constrained scenarios. Weight-only post-training quantization (PTQ) is appealing, as it reduces memory usage and enables practical speedup without low-bit operators or specialized hardware. However, accuracy often degrades significantly in weight-only PTQ at sub-4-bit precision, and our analysis identifies two main causes: (1) down-projection matrices are a well-known quantization bottleneck, but maintaining their fidelity often requires extra bit-width; (2) weight quantization induces activation deviations, but effective correction strategies remain underexplored. To address these issues, we propose D $^{2}$ Quant, a novel weight-only PTQ framework that improves quantization from both the weight and activation perspectives. On the weight side, we design a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Natural Language Processing Techniques · Big Data and Digital Economy