AdaSVD: Adaptive Singular Value Decomposition for Large Language Models

Zhiteng Li; Mingyuan Xia; Jingyuan Zhang; Zheng Hui; Haotong Qin; Linghe Kong; Yulun Zhang; Xiaokang Yang

arXiv:2502.01403·cs.CV·September 26, 2025

AdaSVD: Adaptive Singular Value Decomposition for Large Language Models

Zhiteng Li, Mingyuan Xia, Jingyuan Zhang, Zheng Hui, Haotong Qin, Linghe Kong, Yulun Zhang, Xiaokang Yang

PDF

Open Access 1 Repo 4 Reviews

TL;DR

AdaSVD is an adaptive SVD-based compression method for large language models that intelligently compensates for truncation errors and assigns layer-specific compression ratios, leading to better performance and memory efficiency.

Contribution

AdaSVD introduces adaptive error compensation and layer-wise compression ratios, improving upon existing SVD-based LLM compression techniques.

Findings

01

Outperforms state-of-the-art SVD-based methods across multiple models.

02

Achieves significant memory reduction with minimal performance loss.

03

Demonstrates effectiveness across various LLM and VLM benchmarks.

Abstract

Large language models (LLMs) have achieved remarkable success in natural language processing (NLP) tasks, yet their substantial memory requirements present significant challenges for deployment on resource-constrained devices. Singular Value Decomposition (SVD) has emerged as a promising compression technique for LLMs, offering considerable reductions in memory overhead. However, existing SVD-based methods often struggle to effectively mitigate the errors introduced by SVD truncation, leading to a noticeable performance gap when compared to the original models. Furthermore, applying a uniform compression ratio across all transformer layers fails to account for the varying importance of different layers. To address these challenges, we propose AdaSVD, an adaptive SVD-based LLM compression approach. Specifically, AdaSVD introduces adaComp, which adaptively compensates for SVD truncation…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 4Confidence 4

Strengths

1. The proposed method achieves state-of-the-art performance among SVD-based compression techniques across multiple model architectures and evaluation metrics. 2. It demonstrates practical engineering contributions, including the introduction of a stack-of-batch technique that improves memory manipulation efficiency during compression and inference.

Weaknesses

1. The methodological novelty appears limited, offering relatively few new insights to the field. The centered compression proximal objective (Eq. 5) is already well-established in prior literature and has a known closed-form optimal solution. Empirically solving it through iterative optimization lacks theoretical justification, and it remains unclear why a suboptimal solution yields better empirical performance. Additionally, the layer-wise compression using importance scores has been explored

Reviewer 02Rating 4Confidence 4

Strengths

- Clearly presents two practical improvements to SVD: post-truncation adjustment (adaComp) and layer-wise ratio allocation (adaCR). The SoB description is straightforward and easy to understand. - Provides a thorough comparison with several SVD baselines (SVD/FWSVD/ASVD/SVD-LLM (v1)) across multiple ratios and LM/VLM benchmarks. - Implementation details including whitening and a 256-sample calibration set are provided.

Weaknesses

- Limited novelty. Both components mostly build on existing ideas: (a) updating low-rank factors using calibration data is a standard post-truncation method, and (b) non-uniform, layer-wise rank allocation has been studied before (e.g., SVD-LLM, SVD-LLM (v2)). The “importance” metric is just a simple similarity measure without deeper theoretical justification. - Narrow scope of contribution. The paper focuses on improving rank allocation and post-truncation tuning within the SVD pipeline, rather

Reviewer 03Rating 2Confidence 4

Strengths

1. The proposed adaComp makes the optimization process more stable. 2. The proposed method is evaluated on both LLM and VLM, which shows its generality.

Weaknesses

1. Lack of evaluation on modern LLMs. All experimental results are based on older models such as LLaMA2-7B and Mistral-7B. The study should include evaluations on more recent models, such as the LLaMA3 and Qwen series, to strengthen its relevance and generalizability. 2. The authors claim that their method outperforms previous approaches across compression ratios ranging from 40% to 80%. However, the presented results only cover the range from 40% to 60%, leaving the higher ratios unverified. 3.

Reviewer 04Rating 2Confidence 5

Strengths

1. This paper is well-written and easy to follow. The delicate illustration and text can significantly help readers to better understand the idea of this paper. 2. Experiments is extensive and comprehensive. The experiments covers multiple LLMs from different LLM family as well as different downstream tasks. 3. This paper presents in-depth analysis regarding the proposed method and overall assessment is good, providing a convincing evidence to demonstrate the superiority of the proposed method

Weaknesses

1. Lack of novelity and seemly incremental contribution. The *AdaComp* is almost the same as the early version of SVD-LLM (https://arxiv.org/pdf/2403.07378v1), where it also adopts a closed-form update to the decomposed matrix. Additionally, the compression ratio allocation strategy in *AdaCR* is not a new thing. This naive compression ratio allocation strategy appears in many submissions to preivous conferences. It originally comes from *Outlier Weighed Layerwise Sparsity (OWL): A Missing Secr

Code & Models

Repositories

zhitengli/adasvd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques