Enhancing Delta Compression in LLMs via SVD-based Quantization Error Minimization
Boya Xiong, Shuo Wang, Weifeng Ge, Guanhua Chen, Yun Chen

TL;DR
PrinMix introduces a mathematically grounded, SVD-based quantization framework for delta compression in LLMs, optimizing error minimization and outperforming state-of-the-art methods on large models.
Contribution
It models quantization as an optimization problem, derives a key scaling mechanism, and employs ILP for optimal bit allocation, advancing delta compression techniques.
Findings
Outperforms SOTA Delta-CoMe by 22.3% on AIME2024
Achieves 6.1% improvement on GQA benchmark
Effectively reduces storage for 7B LLMs
Abstract
Supervised Fine-Tuning (SFT) empowers Large Language Models (LLMs) with exceptional performance on specialized tasks, but it yields dense, high-dimensional delta parameters that pose severe storage and distribution challenges. Singular Value Decomposition (SVD)-based compression offers a compact representation for such delta parameters, but existing methods adopt heuristic quantization without clarifying underlying mechanisms, leading to poor generalizability. In this work, we propose PrinMix, a rigorous SVD-based framework that models quantization as an optimization problem, grounding the design in mathematical mechanisms. We first theoretically derive quantization error and identify a key singular-value-dominated scaling mechanism, which mathematically proves the necessity of mix-precision quantization. We then model the quantization scheme as a 0/1 Integer Linear Programming (ILP)…
Peer Reviews
Decision·Submitted to ICLR 2026
- Addresses a practical problem in model distribution and storage: delta checkpoint compression for multi-task or multi-domain fine-tuned models.
- The core idea—combining low-rank and quantized residual compression, is well explored in prior works such as QLoRA, AdaLoRA, and CompAdapter. The proposed “layer-wise scaling reweighting” is a small variant of norm-based importance metrics used in parameter-efficient tuning. - The method is entirely empirical. The paper lacks mathematical justification or analysis on how the scaling or residual quantization improves representational fidelity beyond heuristic intuition. - Experiments are rest
* Principled objective: Explicitly minimizes a reconstruction-error surrogate in SVD space, yielding a clear justification for row-wise mixed precision of (V) under a bit budget. The $(\Sigma_{ii}^2)$ scaling vs. difference decomposition is intuitive and actionable. * Concrete optimization: Bit allocation via 0/1 ILP provides a crisp mechanism to trade off error and storage, with constraints for budget and a cap $(f_{\max})$ on distinct bitwidths. * RTC mechanism: The Reconstruction Target Cor
1. Inconsistency with “no singular-value assumptions.” The method claims to avoid empirical reliance on singular values, yet Section D.1 discards the last (k) ranks by singular-value magnitude to accelerate quantization, explicitly invoking the “larger singular values are more important” heuristic that the paper earlier critiques. This weakens the methodological positioning and may bias comparisons. 2. Fair-budget accounting is under-specified. Results are reported at $(\alpha = 1/16)$, but the
**1. Strong theoretical foundation.** The work formalizes SVD-based delta-compression as an explicit quantization-error-minimization problem and proves the necessity of mixed-precision allocation, advancing the theoretical rigor of delta-compression research. **2. Comprehensive empirical validation.** Evaluations on 7B and 14B LLMs across four domains (reasoning, math, code, vision-language) show clear and reproducible gains over Delta-CoMe, BitDelta, and low-rank baselines. **3. Practical dep
**1. Limited scalability analysis.** While integer-linear optimization is solved once per model, reported solving times (≈ 30 min for 7B) may become impractical for larger or frequent model updates. Discussion on scaling to 70B+ models is missing. **2. Ablation study.** Although four task types are covered, the paper lacks ablation on calibration-set size, bit-budget sensitivity, or robustness under distribution shift, which are important for real-world deployment. **3. Computational overhead
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Algorithms and Data Compression
