Technical Report: Activation Residual Hessian Quantization (ARHQ) for Low-Bit LLM Quantization

YiFeng Wang; Zhun Sun; Keisuke Sakaguchi

arXiv:2605.00140·cs.LG·May 4, 2026

Technical Report: Activation Residual Hessian Quantization (ARHQ) for Low-Bit LLM Quantization

YiFeng Wang, Zhun Sun, Keisuke Sakaguchi

PDF

1 Repo

TL;DR

ARHQ is a novel post-training quantization method that isolates error-sensitive weight directions to improve low-bit LLM quantization performance, demonstrated on Qwen3-4B-Thinking-2507.

Contribution

ARHQ introduces a residual Hessian-based approach with a closed-form SVD to enhance low-bit quantization of LLMs, reducing error propagation.

Findings

01

Significantly improves layer-wise SNR in quantized models.

02

Preserves downstream reasoning performance under aggressive quantization.

03

Effective on Qwen3-4B-Thinking-2507 model.

Abstract

We present Activation Residual Hessian Quantization (ARHQ), a post-training weight splitting method designed to mitigate error propagation in low-bit activation-weight quantization. By constructing an input-side residual Hessian from activation quantization residuals (G_x), ARHQ analytically identifies and isolates error-sensitive weight directions into a high-precision low-rank branch. This is achieved via a closed-form truncated SVD on the scaled weight matrix W G^{1/2}_x . Experimental results on Qwen3-4B-Thinking-2507 demonstrate that ARHQ significantly improves layer-wise SNR and preserves downstream reasoning performance on ZebraLogic even under aggressive quantization. The code is available at https://github.com/BeautMoonQ/ARHQ.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

BeautMoonQ/ARHQ
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.