LoRA-One: One-Step Full Gradient Could Suffice for Fine-Tuning Large Language Models, Provably and Efficiently

Yuanhe Zhang; Fanghui Liu; Yudong Chen

arXiv:2502.01235·stat.ML·June 24, 2025

LoRA-One: One-Step Full Gradient Could Suffice for Fine-Tuning Large Language Models, Provably and Efficiently

Yuanhe Zhang, Fanghui Liu, Yudong Chen

PDF

Open Access 1 Video

TL;DR

This paper introduces LoRA-One, a theoretically grounded and efficient fine-tuning method for large language models that aligns adapters with the full gradient, leading to improved performance across multiple benchmarks.

Contribution

We provide a rigorous theoretical analysis of LoRA adapters, propose the LoRA-One algorithm leveraging one-step full gradients, and demonstrate its empirical superiority over existing methods.

Findings

01

LoRA-One achieves significant performance gains on NLP, reasoning, and code generation benchmarks.

02

Theoretically, LoRA-One ensures linear convergence and better generalization.

03

Proper initialization with the full gradient aligns adapters with singular subspaces, enhancing fine-tuning efficiency.

Abstract

This paper explores how theory can guide and enhance practical algorithms, using Low-Rank Adaptation (LoRA, Hu et al. 2022) in large language models as a case study. We rigorously prove that, under gradient descent, LoRA adapters align with specific singular subspaces of the one-step full fine-tuning gradient. This result suggests that, by properly initializing the adapters using the one-step full gradient, subspace alignment can be achieved immediately and applicable to both linear and nonlinear models. Building on our theory, we propose a theory-driven algorithm, LoRA-One, where the linear convergence (as well as generalization) is built and incorporating preconditioners theoretically helps mitigate the effects of ill-conditioning. Besides, our theory reveals connections between LoRA-One and other gradient-alignment-based methods, helping to clarify misconceptions in the design of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

LoRA-One: One-Step Full Gradient Could Suffice for Fine-Tuning Large Language Models, Provably and Efficiently· slideslive

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Medical Image Segmentation Techniques · Stochastic Gradient Optimization Techniques