The Impact of Initialization on LoRA Finetuning Dynamics
Soufiane Hayou, Nikhil Ghosh, Bin Yu

TL;DR
This paper investigates how different initialization schemes in LoRA finetuning affect model performance and learning dynamics, revealing that initializing B to zero and A randomly generally yields better results due to more stable and efficient learning.
Contribution
The study provides a theoretical and empirical comparison of two LoRA initialization schemes, showing that the zero-initialization of B leads to better performance and allows larger learning rates.
Findings
Zero-initializing B improves performance over the alternative scheme.
Larger learning rates are feasible with zero-initialization, enhancing learning efficiency.
Experimental validation on large language models supports the theoretical analysis.
Abstract
In this paper, we study the role of initialization in Low Rank Adaptation (LoRA) as originally introduced in Hu et al. (2021). Essentially, to start from the pretrained model as initialization for finetuning, one can either initialize B to zero and A to random (default initialization in PEFT package), or vice-versa. In both cases, the product BA is equal to zero at initialization, which makes finetuning starts from the pretrained model. These two initialization schemes are seemingly similar. They should in-principle yield the same performance and share the same optimal learning rate. We demonstrate that this is an incorrect intuition and that the first scheme (initializing B to zero and A to random) on average yields better performance compared to the other scheme. Our theoretical analysis shows that the reason behind this might be that the first initialization allows the use of larger…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVibration and Dynamic Analysis
