LoRA Training Provably Converges to a Low-Rank Global Minimum or It Fails Loudly (But it Probably Won't Fail)
Junsu Kim, Jaeyeon Kim, Ernest K. Ryu

TL;DR
This paper provides a theoretical analysis of LoRA training, showing it converges to low-rank global minima or fails loudly, and explains why it usually succeeds in practice.
Contribution
It offers the first analysis of LoRA's loss landscape without restrictive assumptions, revealing implicit biases toward low-rank solutions.
Findings
LoRA training converges to low-rank global minima or high-rank solutions.
Zero-initialization and weight decay bias LoRA toward low-rank, small-magnitude minima.
Theoretical insights explain LoRA's empirical success in fine-tuning large models.
Abstract
Low-rank adaptation (LoRA) has become a standard approach for fine-tuning large foundation models. However, our theoretical understanding of LoRA remains limited as prior analyses of LoRA's training dynamics either rely on linearization arguments or consider highly simplified setups. In this work, we analyze the LoRA loss landscape without such restrictive assumptions. We define two regimes: a "special regime", which includes idealized setups where linearization arguments hold, and a "generic regime" representing more realistic setups where linearization arguments do not hold. In the generic regime, we show that LoRA training converges to a global minimizer with low rank and small magnitude, or a qualitatively distinct solution with high rank and large magnitude. Finally, we argue that the zero-initialization and weight decay in LoRA training induce an implicit bias toward the low-rank,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Automated Systems
MethodsWeight Decay
