Improving LoRA in Privacy-preserving Federated Learning
Youbang Sun, Zitao Li, Yaliang Li, Bolin Ding

TL;DR
This paper introduces FFA-LoRA, an improved low-rank adaptation method for privacy-preserving federated learning, which stabilizes training, reduces communication costs, and enhances performance over traditional LoRA methods.
Contribution
The paper proposes FFA-LoRA, fixing certain matrices during training to address instability and hyper-parameter sensitivity in federated LoRA, improving efficiency and robustness.
Findings
FFA-LoRA achieves more stable training in FL scenarios.
It halves the communication cost compared to vanilla LoRA.
FFA-LoRA outperforms LoRA in various federated tasks.
Abstract
Low-rank adaptation (LoRA) is one of the most popular task-specific parameter-efficient fine-tuning (PEFT) methods on pre-trained language models for its good performance and computational efficiency. LoRA injects a product of two trainable rank decomposition matrices over the top of each frozen pre-trained model module. However, when applied in the setting of privacy-preserving federated learning (FL), LoRA may become unstable due to the following facts: 1) the effects of data heterogeneity and multi-step local updates are non-negligible, 2) additive noise enforced on updating gradients to guarantee differential privacy (DP) can be amplified and 3) the final performance is susceptible to hyper-parameters. A key factor leading to these phenomena is the discordance between jointly optimizing the two low-rank matrices by local clients and separately aggregating them by the central server.…
Peer Reviews
Decision·ICLR 2024 poster
I like the motivation of the FFA-LoRA algorithm, and appreciate the attempt to provide some analysis on the caveats of LoRA. The experiments on two models (RoBERTa and LLaMA) fine-tuning on a subset of GLUE tasks and a GSM-8K language generation task in both non-DP and DP settings show good empirical performance of FFA-LoRA.
I thank the authors for providing details of the experimental setup. However, the federated learning setting in experiments seems a bit unconventional with a very small number of clients (only 3 clients). This might be categorized as a cross-silo setting, but it would be good to clearly discuss the targeted application (https://arxiv.org/abs/1912.04977 table 1, https://arxiv.org/abs/2107.06917 section 3.1). While I appreciate the motivation of analyzing LoRA in section 3, none of the explanati
+ The motivation is sound and the paper writing is easy to follow. + Empirical results show competitive performance under different differential privacy and parameter budget. + Empirical results are comprehensive, considering multiple tasks and ablation study.
+ The motivation is straightforward and intuitive, without theoretical insights.
1. The study on federated LoRA is timely. 2. The proposed approach is simple to implement. 3. The authors provide case studies to highlight the limitations of the vanilla LoRA and motivate their approach.
1. The benefit of FFA-LoRA on differential privacy (DP) is not very well backed by empirical evaluation. The performance gap between the vanilla LoRA and the proposed FFA-LoRA remains the same across various privacy budgets $\epsilon$, including $\epsilon = 0$. Such an empirical result suggests that the impact of DP noise is the same on both the vanilla LoRA and the proposed FFA-LoRA. 2. I do not see why the proposed FFA-LoRA is free from tuning the hyper-parameter $\alpha$. In Section 4, the a
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques
