LoRA-One: One-Step Full Gradient Could Suffice for Fine-Tuning Large Language Models, Provably and Efficiently
Yuanhe Zhang, Fanghui Liu, Yudong Chen

TL;DR
This paper introduces LoRA-One, a theoretically grounded and efficient fine-tuning method for large language models that aligns adapters with the full gradient, leading to improved performance across multiple benchmarks.
Contribution
We provide a rigorous theoretical analysis of LoRA adapters, propose the LoRA-One algorithm leveraging one-step full gradients, and demonstrate its empirical superiority over existing methods.
Findings
LoRA-One achieves significant performance gains on NLP, reasoning, and code generation benchmarks.
Theoretically, LoRA-One ensures linear convergence and better generalization.
Proper initialization with the full gradient aligns adapters with singular subspaces, enhancing fine-tuning efficiency.
Abstract
This paper explores how theory can guide and enhance practical algorithms, using Low-Rank Adaptation (LoRA, Hu et al. 2022) in large language models as a case study. We rigorously prove that, under gradient descent, LoRA adapters align with specific singular subspaces of the one-step full fine-tuning gradient. This result suggests that, by properly initializing the adapters using the one-step full gradient, subspace alignment can be achieved immediately and applicable to both linear and nonlinear models. Building on our theory, we propose a theory-driven algorithm, LoRA-One, where the linear convergence (as well as generalization) is built and incorporating preconditioners theoretically helps mitigate the effects of ill-conditioning. Besides, our theory reveals connections between LoRA-One and other gradient-alignment-based methods, helping to clarify misconceptions in the design of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Medical Image Segmentation Techniques · Stochastic Gradient Optimization Techniques
