NoRA: Nested Low-Rank Adaptation for Efficient Fine-Tuning Large Models
Cheng Lin, Lujun Li, Dezhi Li, Jie Zou, Wei Xue, Yike Guo

TL;DR
NoRA introduces a nested low-rank adaptation method that improves parameter-efficient fine-tuning of large models by leveraging a dual-layer SVD structure, enhancing task adaptation while reducing tunable parameters.
Contribution
NoRA extends LoRA with a nested structure and SVD, effectively utilizing pre-trained weights and reducing parameters for more precise fine-tuning.
Findings
Outperforms LoRA in various tasks
Reduces tunable parameters significantly
Enhances model adaptation accuracy
Abstract
In this paper, we introduce Nested Low-Rank Adaptation (NoRA), a novel approach to parameter-efficient fine-tuning that extends the capabilities of Low-Rank Adaptation (LoRA) techniques. Vanilla LoRA overlooks pre-trained weight inheritance and still requires fine-tuning numerous parameters. To addresses these issues, our NoRA adopts a dual-layer nested structure with Singular Value Decomposition (SVD), effectively leveraging original matrix knowledge while reducing tunable parameters. Specifically, NoRA freezes the outer LoRA weights and utilizes an inner LoRA design, providing enhanced control over model optimization. This approach allows the model to more precisely adapt to specific tasks while maintaining a compact parameter space. By freezing outer LoRA weights and using an inner LoRA design, NoRA enables precise task adaptation with a compact parameter space. Evaluations on tasks…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
1、The paper provides a rigorous empirical evaluation of NoRA across a diverse set of linguistic and visual tasks, demonstrating its effectiveness and efficiency. The use of multiple benchmarks and the comparison against LoRA variants in various scenarios ensure that the results are robust and generalize well across different domains. 2、The paper employs a sound methodology, with a clear problem statement and a well-defined approach to address the challenges in fine-tuning large models. The activ
1、The core of the article appears to revolve around the activation-aware matrix, which is the foundation and heart of the entire method. However, the paper seems to lack a discussion on how to confirm that the activation-aware matrix used is superior, whether there are other methods available, and how to determine whether this matrix can provide more useful information. Moreover, the approach of merely performing singular value decomposition on the activation-aware matrix and then nesting LoRA m
1. The paper is well-written and organized, with an intuitive motivation. 2. The method is clever, leveraging observations on LLMs—particularly their sensitivity to activation outliers—to propose an improved LoRA initialization. The upgrade from standard SVD to activation-aware SVD (AwSVD) enhances performance and reduces optimization difficulty. 3. The NORA structure, based on AwSVD initialization, further reduces the number of learnable parameters, enabling more efficient and lower-cost traini
1. For instruction fine-tuning tasks, the paper only compares performance under settings with extremely low learnable parameters, which shows competitive results but falls significantly short of full-rank LoRA in performance. This raises concerns about whether the primary benefits of this work apply mainly to in-domain task transfers. 2. Is it necessary to reduce the number of optimization parameters in LoRA to save memory (particularly in the optimizer) or training time? After all, we don’t alw
+ The paper is well-structured and logically organized. + While some components are inspired by prior work, the integration of these elements is novel. + SoTA performance and low budgets. + Resonable motivations.
- The statement regarding NoRA’s rank enabling more complex non-linear transformations lacks theoretical grounding. The discussion around “expressiveness” in that section is underdeveloped. Simply stating that NoRA’s rank is limited by \min(r, r{\prime}) does not elucidate how or why this rank impacts expressiveness. Thus, the authors should provide more experiments or theoretical explanations to demonstrate their claims. - The decision to freeze the outer LoRA parameters to “maintain stabili
The experiments are extensive. The procedure of method is clear.
Too many weaknesses led me to choose to reject this paper. Furthermore, I believe this paper requires at least one major revision before it can be considered for a top-tier conference. 1. Line 43-44. “which can lead to slow convergence and potential overfitting problems.” The authors claimed two main approaches have emerged to address the aforementioned issues. However, DoRA cannot address them [1-2]. Indeed, DoRA sometimes easily fall into not converging. 2. Line 12, “but it still necessitates
**1. Clear and Well-Structured Writing** - The paper is well-written, with a clear and logical structure that makes it easy to follow. Concepts are explained in a straightforward manner, and the overall organization helps the reader grasp the technical content effectively. - Figures and illustrations are clean, well-labeled, and support the text, helping to visually convey the architecture and results clearly. **2. Innovative Techniques for Performance Improvement** - The proposed nested LoRA
**1. Limited Scope of Comparative Analysis** - The unified design space presented in the paper is not comprehensive enough. It primarily focuses on VeRA and LoRA-XS approaches, lacking coverage of other significant approaches in this domain. - A comprehensive table summarizing the design choices of previous works is missing. Such a table would enhance the clarity and depth of comparisons. - The comparison provided in Figure 2 resembles an ablation study of the proposed techniques rather than a
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Advanced Neural Network Applications · Brain Tumor Detection and Classification
