Stabilized Fine-Tuning with LoRA in Federated Learning: Mitigating the Side Effect of Client Size and Rank via the Scaling Factor
Jiayu Huang, Xiaohu Wu, Tiantian He, Qicheng Lao

TL;DR
This paper introduces SFed-LoRA, a novel federated learning framework that stabilizes low-rank adaptation of large language models by optimizing the scaling factor, effectively mitigating aggregation variance and enabling high-rank adaptation.
Contribution
The paper provides a theoretical analysis and a new scaling factor for LoRA in federated learning, addressing instability issues caused by client aggregation and enabling high-rank adaptation.
Findings
SFed-LoRA prevents high-rank collapse.
It achieves improved stability and faster convergence.
It maintains model architecture and inference latency.
Abstract
Large Language Models (LLMs) are pivotal in natural language processing. The impracticality of full fine-tuning has prompted Parameter-Efficient Fine-Tuning (PEFT) methods like Low-Rank Adaptation (LoRA), optimizing low-rank matrices A and B. In distributed scenarios where privacy constraints necessitate Federated Learning (FL), however, the integration of LoRA is often unstable. Specifically, we identify that aggregating updates from multiple clients introduces statistical variance that scales with the client count, causing gradient collapse when using high-rank adapters. Existing scaling factor candidates, such as the one used by Rank-Stabilized LoRA, ignore the interaction caused by the aggregation process. To bridge this gap, this paper introduces Stabilized Federated LoRA (SFed-LoRA), a framework that theoretically characterizes the interaction between adapter rank and federated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Mobile Crowdsensing and Crowdsourcing
