TL;DR
ERC-SVD introduces an error-controlled SVD approach for large language model compression, effectively reducing truncation loss and error propagation, leading to improved performance across various models and datasets.
Contribution
It proposes a novel post-training SVD-based compression method that leverages residual matrices and selective layer compression to enhance LLM compression effectiveness.
Findings
Outperforms existing methods on multiple benchmarks
Reduces truncation loss through residual matrix utilization
Mitigates error propagation by selective layer compression
Abstract
Large language models (LLMs) have demonstrated impressive capabilities in a wide range of downstream natural language processing tasks. Nevertheless, their considerable sizes and memory demands hinder practical deployment, underscoring the importance of developing efficient compression strategies. Singular value decomposition (SVD) decomposes a matrix into orthogonal components, enabling efficient low-rank approximation. This is particularly suitable for LLM compression, where weight matrices often exhibit significant redundancy. However, current SVD-based methods neglect the residual matrix from truncation, resulting in significant truncation loss. Additionally, compressing all layers of the model results in severe error propagation. To overcome these limitations, we propose ERC-SVD, a new post-training SVD-based LLM compression method from an error-controlled perspective.…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
- Two-stage residual compensation and partial-layer compression provide a new perspective for SVD compression. - Rigorous theory and strict experimental control ensure reliable results. - Complex technologies are described in plain language, with formulas and pseudocode for high readability.
- Only the calculation process for residual compensation is given, without more detailed theoretical derivation from the loss function to explain why this method is effective. - A notable limitation of the experiments presented in the paper lies in the selection of baselines for comparison—specifically, the exclusion of newer SVD-based LLM compression methods that are highly relevant to the research topic, such as Dobi-SVD.
1.The idea of partial-layer compression for SVD is novel and addresses a meaningful issue in SVD-based compression. 2.Experimental design is thorough, with comprehensive coverage of relevant evaluation points.
The paper demonstrates a lack of mathematical understanding regarding SVD: from a theoretical perspective, the proposed residual compensation for SVD truncation is equivalent to a single SVD truncation and does not provide additional benefit. The choice of partial compression ratio lacks theoretical justification or empirical explanation. Recent works such as SHEARED LLAMA use a learn-then-prune strategy for structured pruning, which is not considered here. The proposed “error” metric is also n
1. Clear design; both REC and PLC contribute measurable improvements. 2. Compatible with quantization (GPTQ) and generalizes to VLMs (LLaVA).
1. **Limited theoretical depth**: - Eq. (5) defines the *effective scale*: $\alpha = \frac{m n}{m + n}$. But no derivation or intuition is given for why this specific harmonic-like mean is appropriate for rank scaling. - The per-layer rank formula: $r = (1 - R_\ell)\,\alpha$. This is heuristic and not theoretically motivated; it is unclear why the compression ratio $R_\ell$ interacts linearly with $\alpha$. - The residual compensation rank: $r_r = \alpha\,\beta$ (with $\beta
Good experiment results
1. Limited novelty. The core contribution is simply applying SVD twice—first to the weight matrix, then to the residual. This is not a novel algorithmic insight. 2. Lack of theoretical justification. The authors didn’t explain why two-stage SVD should outperform single-stage decomposition given the same rank budget. There's no analysis of what properties of the residual matrix that make separate decomposition beneficial, no approximation error bounds, and no indicators of when the method works
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
