TL;DR
ARHQ is a novel post-training quantization method that isolates error-sensitive weight directions to improve low-bit LLM quantization performance, demonstrated on Qwen3-4B-Thinking-2507.
Contribution
ARHQ introduces a residual Hessian-based approach with a closed-form SVD to enhance low-bit quantization of LLMs, reducing error propagation.
Findings
Significantly improves layer-wise SNR in quantized models.
Preserves downstream reasoning performance under aggressive quantization.
Effective on Qwen3-4B-Thinking-2507 model.
Abstract
We present Activation Residual Hessian Quantization (ARHQ), a post-training weight splitting method designed to mitigate error propagation in low-bit activation-weight quantization. By constructing an input-side residual Hessian from activation quantization residuals (G_x), ARHQ analytically identifies and isolates error-sensitive weight directions into a high-precision low-rank branch. This is achieved via a closed-form truncated SVD on the scaled weight matrix W G^{1/2}_x . Experimental results on Qwen3-4B-Thinking-2507 demonstrate that ARHQ significantly improves layer-wise SNR and preserves downstream reasoning performance on ZebraLogic even under aggressive quantization. The code is available at https://github.com/BeautMoonQ/ARHQ.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
