Robust Ultra Low-Bit Post-Training Quantization via Stable Diagonal Curvature Estimate

Jaemin Kim; Sungkyun Kim; Junyeol Lee; Jiwon Seo

arXiv:2604.13806·cs.LG·April 16, 2026

Robust Ultra Low-Bit Post-Training Quantization via Stable Diagonal Curvature Estimate

Jaemin Kim, Sungkyun Kim, Junyeol Lee, Jiwon Seo

PDF

TL;DR

DASH-Q is a robust post-training quantization method for large language models that improves accuracy at ultra low-bit widths by filtering noise and preserving salient features using a diagonal Hessian approximation.

Contribution

It introduces DASH-Q, a novel PTQ framework that employs diagonal Hessian approximation and iterative weighted least squares for stable ultra low-bit quantization.

Findings

01

Outperforms existing PTQ methods in ultra low-bit regimes.

02

Improves zero-shot accuracy by up to 14.01% over strong baselines.

03

Maintains robust performance with very small calibration datasets.

Abstract

Large Language Models (LLMs) are widely used across many domains, but their scale makes deployment challenging. Post-Training Quantization (PTQ) reduces memory footprint without retraining by leveraging a small calibration set. Recent Hessian-based PTQ methods compensate quantization error via cross-channel dependencies, but such approaches degrade at low bit-widths due to noisy curvature estimates from limited calibration data. We propose DASH-Q, a robust PTQ framework using diagonal Hessian approximation and iterative weighted least squares. By discarding noise-prone dependencies, DASH-Q filters sampling noise while prioritizing the preservation of salient feature power. We outperform other PTQ baselines in ultra low-bit regime, improving zero-shot accuracy by 7.01% on average and up to 14.01% over the strongest baselines across five baseline LLM models, while showing robust and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.