LoRA vs Full Fine-tuning: An Illusion of Equivalence
Reece Shuttleworth, Jacob Andreas, Antonio Torralba, Pratyusha Sharma

TL;DR
This paper compares LoRA and full fine-tuning of large language models, revealing that LoRA introduces unique high-rank singular vectors called intruder dimensions which affect model forgetting and performance.
Contribution
It demonstrates that LoRA and full fine-tuning produce fundamentally different spectral structures in weight matrices, especially regarding intruder dimensions and their impact on forgetting.
Findings
LoRA introduces high-rank intruder dimensions not present in full fine-tuning.
Intruder dimensions are causally linked to model forgetting.
Scaling down intruder dimensions improves pre-training distribution modeling with minimal performance loss.
Abstract
Fine-tuning is a crucial paradigm for adapting pre-trained large language models to downstream tasks. Recently, methods like Low-Rank Adaptation (LoRA) have been shown to effectively fine-tune LLMs with an extreme reduction in trainable parameters. But, \emph{are their learned solutions really equivalent?} We study how LoRA and full-finetuning change pre-trained models by analyzing the model's weight matrices through the lens of their spectral properties. We find that LoRA and full fine-tuning yield weight matrices whose singular value decompositions exhibit very different structure: weight matrices trained with LoRA have new, high-ranking singular vectors, which we call \emph{intruder dimensions}, while those trained with full fine-tuning do not. Further, we extend the finding that LoRA forgets less than full fine-tuning and find its forgetting is vastly localized to the intruder…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗fblgit/cybertron-v4-qw7B-MGSmodel· 54 dl· ♡ 1654 dl♡ 16
- 🤗QuantFactory/cybertron-v4-qw7B-MGS-GGUFmodel· 43 dl· ♡ 243 dl♡ 2
- 🤗crestf411/L3.1-8B-Slush-v1.1model· 18 dl· ♡ 718 dl♡ 7
- 🤗fblgit/cybertron-v4-qw7B-UNAMGSmodel· 20 dl· ♡ 920 dl♡ 9
- 🤗crestf411/MN-Slushmodel· 12 dl· ♡ 3412 dl♡ 34
- 🤗crestf411/Q2.5-32B-Slushmodel· 8 dl· ♡ 118 dl♡ 11
- 🤗lucyknada/crestf411_MN-Slush-exl2model· ♡ 1♡ 1
- 🤗RichardErkhov/fblgit_-_cybertron-v4-qw7B-MGS-4bitsmodel
- 🤗RichardErkhov/fblgit_-_cybertron-v4-qw7B-MGS-8bitsmodel
- 🤗MetaphoricalCode/Q2.5-32B-Slush-exl3-4bpw-hb6model
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmbedded Systems Design Techniques · Fault Detection and Control Systems · Parallel Computing and Optimization Techniques
