LoRA-Pro: Are Low-Rank Adapters Properly Optimized?
Zhengbo Wang, Jian Liang, Ran He, Zilei Wang, Tieniu Tan

TL;DR
LoRA-Pro improves low-rank adaptation for foundation models by adjusting gradients to better approximate full fine-tuning, significantly narrowing performance gaps across multiple tasks.
Contribution
This paper introduces LoRA-Pro, a novel method that optimally adjusts low-rank gradients, enhancing LoRA's performance and bridging the gap with full fine-tuning.
Findings
LoRA-Pro substantially improves LoRA's performance across tasks.
Theoretical derivation of optimal gradient adjustments.
Effective across NLP, reasoning, code, and image tasks.
Abstract
Low-rank adaptation, also known as LoRA, has emerged as a prominent method for parameter-efficient fine-tuning of foundation models. Despite its computational efficiency, LoRA still yields inferior performance compared to full fine-tuning. In this paper, we first uncover a fundamental connection between the optimization processes of LoRA and full fine-tuning: using LoRA for optimization is mathematically equivalent to full fine-tuning using a low-rank gradient for parameter updates. And this low-rank gradient can be expressed in terms of the gradients of the two low-rank matrices in LoRA. Leveraging this insight, we introduce LoRA-Pro, a method that enhances LoRA's performance by strategically adjusting the gradients of these low-rank matrices. This adjustment allows the low-rank gradient to more accurately approximate the full fine-tuning gradient, thereby narrowing the performance gap…
Peer Reviews
Decision·ICLR 2025 Spotlight
In this paper, the optimal solution of adjusting the gradient of low rank matrix is deduced theoretically and applied to the fine tuning process of LORA-PRO, which narrows the performance gap between LoRA and full fine tuning. Extensive experiments in multiple tasks have been carried out to prove that LORA-Pro greatly improves the performance of LoRA.
1. The method is not applied to the latest LLMs. 2. LoRA-Pro brings in extra memory cost and computation cost in the training. 3. It is not clear why the optimal gradients for the low-rank matrices do not explicitly depend on the full fine-tuning gradient.
- The motivation is clearly articulated, and the paper is well-written with well-explained ideas. - The proposed method enhances LoRA, a highly relevant technique for fine-tuning, in a straightforward manner, supported by both theoretical proofs and experimental evidence. - The experiments and ablation studies are thorough, demonstrating the effectiveness of the proposed method across diverse settings.
- The authors conducted all experiments for the different methods (for a given task) using the same learning rates. However, this approach may not provide a fair representation of optimal conditions, as the selected learning rate could work well for one method but not for others. To strengthen their findings, it would be beneficial for the authors to perform a learning rate sweep for each method. If this is too resource-intensive for all datasets, they might consider limiting the sweep to a sing
- The paper presents a novel approach by mathematically linking LoRA with full fine-tuning through gradient adjustments. - The theoretical foundation is robust, with clear derivations and optimal solutions provided for gradient adjustments. The experiments are comprehensive, covering multiple domains such as language understanding, dialogue generation, mathematical reasoning, and image classification. - The paper is well-organized.
- It will be better to test LoRA-Pro in real large models (e.g., 70B).
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques
