Beyond Zero Initialization: Investigating the Impact of Non-Zero Initialization on LoRA Fine-Tuning Dynamics

Shiwei Li; Xiandi Luo; Xing Tang; Haozhao Wang; Hao Chen; Weihong Luo; Yuhua Li; Xiuqiang He; Ruixuan Li

arXiv:2505.23194·cs.LG·August 19, 2025

Beyond Zero Initialization: Investigating the Impact of Non-Zero Initialization on LoRA Fine-Tuning Dynamics

Shiwei Li, Xiandi Luo, Xing Tang, Haozhao Wang, Hao Chen, Weihong Luo, Yuhua Li, Xiuqiang He, Ruixuan Li

PDF

1 Repo

TL;DR

This paper explores how initializing LoRA layers with non-zero values affects fine-tuning dynamics, revealing improved robustness and that strict starting from pretrained weights is unnecessary, supported by theoretical analysis and experiments.

Contribution

It provides the first theoretical analysis of non-zero initialization in LoRA and demonstrates its benefits for robustness and flexibility in fine-tuning.

Findings

01

Non-zero initialization improves robustness to learning rate choices.

02

Non-zero initialization generally does not harm fine-tuning performance.

03

Theoretical analysis supports empirical results.

Abstract

Low-rank adaptation (LoRA) is a widely used parameter-efficient fine-tuning method. In standard LoRA layers, one of the matrices, $A$ or $B$ , is initialized to zero, ensuring that fine-tuning starts from the pretrained model. However, there is no theoretical support for this practice. In this paper, we investigate the impact of non-zero initialization on LoRA's fine-tuning dynamics from an infinite-width perspective. Our analysis reveals that, compared to zero initialization, simultaneously initializing $A$ and $B$ to non-zero values improves LoRA's robustness to suboptimal learning rates, particularly smaller ones. Further analysis indicates that although the non-zero initialization of $A B$ introduces random noise into the pretrained weight, it generally does not affect fine-tuning performance. In other words, fine-tuning does not need to strictly start from the pretrained model. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

leopold1423/non_zero_lora-icml25
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.