Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Wenhua Cheng, Weiwei Zhang, Haihao Shen, Yiyang Cai, Xin He, Kaokao, Lv, Yi Liu

TL;DR
This paper introduces SignRound, a novel weight quantization method for large language models that uses signed gradient descent to optimize rounding and clipping, achieving high accuracy with minimal tuning.
Contribution
SignRound combines QAT and PTQ advantages, optimizing weight rounding via signed gradient descent in just 200 steps, reducing tuning costs and inference overhead.
Findings
Achieved 6.91% to 33.22% accuracy improvements at 2 bits.
Demonstrated near-lossless 4-bit quantization in most scenarios.
Effective across models and tasks, with minimal tuning effort.
Abstract
Large Language Models (LLMs) have demonstrated exceptional proficiency in language-related tasks, but their deployment poses significant challenges due to substantial memory and storage requirements. Weight-only quantization has emerged as a promising solution, significantly reducing memory and storage needs without sacrificing too much performance. In this study, we introduce SignRound, a method that leverages signed gradient descent (SignSGD) to optimize rounding values and weight clipping in just 200 steps. SignRound integrates the advantages of Quantization-Aware Training (QAT) and Post-Training Quantization (PTQ), delivering exceptional results across 2 to 4 bits while minimizing tuning costs and avoiding additional inference overhead. For example, SignRound achieved absolute average accuracy improvements ranging from 6.91% to 33.22% at 2bits, as measured by the average zero-shot…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗Intel/Qwen3.5-27B-int4-AutoRoundmodel· 17k dl· ♡ 1517k dl♡ 15
- 🤗Intel/Qwen3-Coder-Next-int4-AutoRoundmodel· 22k dl· ♡ 2422k dl♡ 24
- 🤗Intel/Qwen3.5-122B-A10B-int4-AutoRoundmodel· 101k dl· ♡ 27101k dl♡ 27
- 🤗Intel/Qwen3.5-35B-A3B-gguf-q2ks-mixed-AutoRoundmodel· 2.5k dl· ♡ 62.5k dl♡ 6
- 🤗Intel/Qwen3.5-122B-A10B-gguf-q2ks-mixed-AutoRoundmodel· 476 dl· ♡ 4476 dl♡ 4
- 🤗Intel/Qwen3.5-397B-A17B-int4-AutoRoundmodel· 42k dl· ♡ 1342k dl♡ 13
- 🤗Intel/Qwen3.5-9B-int4-AutoRoundmodel· 247k dl· ♡ 20247k dl♡ 20
- 🤗Intel/LongCat-Flash-Lite-int4-AutoRoundmodel· 14 dl· ♡ 214 dl♡ 2
- 🤗Intel/Step-3.5-Flash-int4-mixed-AutoRoundmodel· 1.7k dl· ♡ 71.7k dl♡ 7
- 🤗happypatrick/Qwen3.5-397B-A17B-heretic-int4-AutoRoundmodel· 425 dl· ♡ 2425 dl♡ 2
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
