SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language Models
Yang Cao, Zhao Song

TL;DR
SORSA introduces a novel PEFT method using SVD-based adapters with orthonormal regularization, achieving faster convergence and superior performance in large language model fine-tuning.
Contribution
The paper presents SORSA, a new PEFT approach that leverages SVD and orthonormal regularization for efficient and effective large language model adaptation.
Findings
SORSA outperforms LoRA and full fine-tuning in convergence speed.
On GSM-8K, SORSA achieves 56.03% accuracy, higher than LoRA and full FT.
SORSA adapters can be merged during inference, eliminating latency.
Abstract
In this paper, we propose Singular Values and Orthonormal Regularized Singular Vectors Adaptation, or SORSA, a novel parameter efficient fine-tuning (PEFT) method. Each SORSA adapter consists of two main parts: trainable principal singular weights , and frozen residual weights . These parts are initialized by performing singular value decomposition (SVD) on pre-trained weights. Moreover, we implement and analyze an orthonormal regularizer, which we prove could decrease the condition number of and make the optimization more efficient. SORSA adapters could be merged during inference, thus eliminating any inference latency. We also introduce a method to analyze the variation of the parameters by performing SVD and discuss and analyze SORSA's superiority in minimizing the alteration in the SVD aspect. After all,…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
1. The proposed method outperforms the other PEFT techniques such as LoRA and PiSSA in terms of accuracy on various benchmarks, showcasing its effectiveness. The results look very promising on a variety of experiments. 2. The authors also analyze the variation patterns of singular values and vectors during parameter updates and compare SORSA with other PEFT methods such as LoRA and partial fine-tuning.
1. The noverty of this paper is limited. the initialization method is from Pissa [1], and updates in the form of singular value decomposition and the orthonormality regularizer are from AdaLoRA [2]. 2. Some symbols in Theorem 3 and Theorem 5 are used without the previous definition, which can be confusing. It is better restate these theorems to make them more straightforward. 3. Proof of Theorem 2 is not right. Line 1035-1036. "This L is finite because the Frobenius norms of U and V are bounded
The paper reports superior performance for LLMs fine-tuned with SORSA compared to the PiSSA method. Additionally, the proposed approach is relatively easy to implement. The originality of this work lies in combining the ideas behind AdaLoRA and PiSSA, as described in the "Summary" section.
Firstly, the paper suffers from poor writing quality, with numerous grammatical errors and awkward English expressions (too many to list exhaustively in this review). The text is also poorly structured, with some mathematical symbols left undefined. The figures use such small font sizes that they are almost unreadable, even in the electronic version where one can zoom in. More detailed feedback is provided below. The paper appears to be written by an inexperienced author, so I would recommend se
1. While orthonormal regularization on incremental low-rank matrices has been used in existing works, the analysis of the condition number is novel to me, and the improved stability makes sense. 2. The approach is simple and effective, as it directly applies orthonormal regularization.
1. The method is not well-motivated. The authors begin with an analysis of singular values and vectors, which shows a different updating pattern of SORSA compared to other methods. However, it is unclear how this is connected to the limitation of the generalization ability of LoRA and FT. I can only observe a limitation of the learning capacity of LoRA and FT. Additionally, it is not clear why the orthonormal regularization leads to a different updating pattern, as shown in Figure 2, and why thi
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsSingular Values and Orthonormal Regularized Singular Vectors Adaptation
