D$^2$Quant: Accurate Low-bit Post-Training Weight Quantization for LLMs
Xianglong Yan, ChengZhu Bao, Zhiteng Li, Tianao Zhang, Shaoqiu Zhang, Ruobing Xie, Samm Sun, and Yulun Zhang

TL;DR
D$^2$Quant introduces a novel weight-only post-training quantization method for large language models, significantly improving accuracy at sub-4-bit precision by addressing quantization bottlenecks and activation deviations.
Contribution
It proposes Dual-Scale Quantizer and Deviation-Aware Correction to enhance low-bit PTQ accuracy without extra bit-width or hardware changes.
Findings
Achieves superior accuracy at sub-4-bit precision across multiple LLMs.
Effectively mitigates activation distribution shifts during quantization.
Provides a practical framework with open-source code for efficient LLM deployment.
Abstract
Large language models (LLMs) deliver strong performance, but their high compute and memory costs make deployment difficult in resource-constrained scenarios. Weight-only post-training quantization (PTQ) is appealing, as it reduces memory usage and enables practical speedup without low-bit operators or specialized hardware. However, accuracy often degrades significantly in weight-only PTQ at sub-4-bit precision, and our analysis identifies two main causes: (1) down-projection matrices are a well-known quantization bottleneck, but maintaining their fidelity often requires extra bit-width; (2) weight quantization induces activation deviations, but effective correction strategies remain underexplored. To address these issues, we propose DQuant, a novel weight-only PTQ framework that improves quantization from both the weight and activation perspectives. On the weight side, we design a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Natural Language Processing Techniques · Big Data and Digital Economy
