HeRo-Q: A General Framework for Stable Low Bit Quantization via Hessian Conditioning
Jinhao Zhang Yunquan Zhang, Zicheng yan, Boyang Zhang, Jun Sun, Daning Cheng

TL;DR
HeRo-Q introduces a Hessian conditioning-based framework for stable low-bit quantization of large language models, significantly improving robustness and accuracy in ultra low-bit regimes without architectural changes.
Contribution
The paper presents HeRo-Q, a novel Hessian conditioning method that reshapes the loss landscape to enhance low-bit quantization robustness for large language models.
Findings
Outperforms state-of-the-art quantization methods like GPTQ, AWQ, and SpinQuant.
Achieves 70.15% GSM8K accuracy on Llama3 8B in ultra low-bit W3A16 regime.
Effectively prevents logical collapse in aggressive quantization scenarios.
Abstract
Post Training Quantization (PTQ), a mainstream model compression technique, often leads to the paradoxical 'low error, high loss' phenomenon because it focuses solely on minimizing quantization error. The root cause lies in the Hessian matrix of the LLM loss landscape: a few high curvature directions are extremely sensitive to perturbations. To address this, we propose the Hessian Robust Quantization (HeRo Q) algorithm, which applies a lightweight, learnable rotation-compression matrix to the weight space prior to quantization. This joint framework reshapes the loss landscape by reducing the largest Hessian eigenvalue and reducing its max eigenvalue, thereby significantly enhancing robustness to quantization noise. HeRo-Q requires no architectural modifications, incurs negligible computational overhead, and integrates seamlessly into existing PTQ pipelines. Experiments on Llama and Qwen…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Advanced Neural Network Applications · Stochastic Gradient Optimization Techniques
