TL;DR
This paper explores how initializing LoRA layers with non-zero values affects fine-tuning dynamics, revealing improved robustness and that strict starting from pretrained weights is unnecessary, supported by theoretical analysis and experiments.
Contribution
It provides the first theoretical analysis of non-zero initialization in LoRA and demonstrates its benefits for robustness and flexibility in fine-tuning.
Findings
Non-zero initialization improves robustness to learning rate choices.
Non-zero initialization generally does not harm fine-tuning performance.
Theoretical analysis supports empirical results.
Abstract
Low-rank adaptation (LoRA) is a widely used parameter-efficient fine-tuning method. In standard LoRA layers, one of the matrices, or , is initialized to zero, ensuring that fine-tuning starts from the pretrained model. However, there is no theoretical support for this practice. In this paper, we investigate the impact of non-zero initialization on LoRA's fine-tuning dynamics from an infinite-width perspective. Our analysis reveals that, compared to zero initialization, simultaneously initializing and to non-zero values improves LoRA's robustness to suboptimal learning rates, particularly smaller ones. Further analysis indicates that although the non-zero initialization of introduces random noise into the pretrained weight, it generally does not affect fine-tuning performance. In other words, fine-tuning does not need to strictly start from the pretrained model. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
