DAQ: Delta-Aware Quantization for Post-Training LLM Weight Compression

Xiaoming Yu; Shize Tang; Guanghua Yu; Linchuan Xie; Song Liu; Jianchen Zhu; Feng Li

arXiv:2603.22324·cs.LG·March 25, 2026

DAQ: Delta-Aware Quantization for Post-Training LLM Weight Compression

Xiaoming Yu, Shize Tang, Guanghua Yu, Linchuan Xie, Song Liu, Jianchen Zhu, Feng Li

PDF

Open Access

TL;DR

DAQ introduces a delta-aware, data-free post-training quantization method that preserves model knowledge by focusing on the directional fidelity of weight deltas, improving style-specific capabilities in LLMs.

Contribution

It proposes a novel delta-aware quantization framework that directly optimizes for the fidelity of weight deltas, enhancing post-training LLM weight compression without data.

Findings

01

Recovers style-specific capabilities lost under standard quantization

02

Maintains general performance in FP8 quantization

03

Requires only base and post-trained weights for optimization

Abstract

We introduce Delta-Aware Quantization (DAQ), a data-free post-training quantization framework that preserves the knowledge acquired during post-training. Standard quantization objectives minimize reconstruction error but are agnostic to the base model, allowing quantization noise to disproportionately corrupt the small-magnitude parameter deltas ( $Δ W$ ) that encode post-training behavior -- an effect we analyze through the lens of quantization as implicit regularization. DAQ replaces reconstruction-based objectives with two delta-aware metrics -- Sign Preservation Rate and Cosine Similarity -- that directly optimize for directional fidelity of $Δ W$ , requiring only the base and post-trained weight matrices. In a pilot FP8 study, DAQ recovers style-specific capabilities lost under standard quantization while maintaining general performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications