D-QRELO: Training- and Data-Free Delta Compression for Large Language Models via Quantization and Residual Low-Rank Approximation

Junlin Li; Shuangyong Song; Guodong Du; Ngai Wong; Xuebo Liu; Yongxiang Li; Min Zhang; Jing Li; Xuelong Li

arXiv:2604.16940·cs.LG·April 21, 2026

D-QRELO: Training- and Data-Free Delta Compression for Large Language Models via Quantization and Residual Low-Rank Approximation

Junlin Li, Shuangyong Song, Guodong Du, Ngai Wong, Xuebo Liu, Yongxiang Li, Min Zhang, Jing Li, Xuelong Li

PDF

TL;DR

D-QRELO introduces a novel training- and data-free delta compression method combining quantization and residual low-rank approximation to efficiently compress large language models, especially on large-scale fine-tuning datasets.

Contribution

It proposes DQRELO, a new delta compression technique that outperforms existing methods by effectively handling large-scale fine-tuning data without additional training or data.

Findings

01

D-QRELO achieves superior compression performance across various LLM architectures.

02

Larger fine-tuning datasets increase delta parameter magnitude and entropy, complicating compression.

03

Design principles for delta compression are established based on empirical analysis of model and task factors.

Abstract

Supervised Fine-Tuning (SFT) accelerates taskspecific large language models (LLMs) development, but the resulting proliferation of finetuned models incurs substantial memory overhead. Delta compression addresses this by retaining a single pre-trained LLM with multiple compressed delta weights. However, existing methods fail on models fine-tuned with largescale datasets. We find that larger SFT data scale amplifies delta parameter magnitude, singular values, and entropy, exacerbating compression errors. To tackle this, we propose DQRELO (Delta Compression via Quantization and Residual Low-Rank), a novel training- and data-free delta compression method. It combines coarse-grained one-bit quantization to capture the dominant structure of the delta, followed by compensated residual low-rank approximation to recover fine-grained details from the smaller residual error. Experiments on various…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.