Improving LoRA in Privacy-preserving Federated Learning

Youbang Sun; Zitao Li; Yaliang Li; Bolin Ding

arXiv:2403.12313·cs.LG·March 20, 2024·5 cites

Improving LoRA in Privacy-preserving Federated Learning

Youbang Sun, Zitao Li, Yaliang Li, Bolin Ding

PDF

Open Access 3 Reviews

TL;DR

This paper introduces FFA-LoRA, an improved low-rank adaptation method for privacy-preserving federated learning, which stabilizes training, reduces communication costs, and enhances performance over traditional LoRA methods.

Contribution

The paper proposes FFA-LoRA, fixing certain matrices during training to address instability and hyper-parameter sensitivity in federated LoRA, improving efficiency and robustness.

Findings

01

FFA-LoRA achieves more stable training in FL scenarios.

02

It halves the communication cost compared to vanilla LoRA.

03

FFA-LoRA outperforms LoRA in various federated tasks.

Abstract

Low-rank adaptation (LoRA) is one of the most popular task-specific parameter-efficient fine-tuning (PEFT) methods on pre-trained language models for its good performance and computational efficiency. LoRA injects a product of two trainable rank decomposition matrices over the top of each frozen pre-trained model module. However, when applied in the setting of privacy-preserving federated learning (FL), LoRA may become unstable due to the following facts: 1) the effects of data heterogeneity and multi-step local updates are non-negligible, 2) additive noise enforced on updating gradients to guarantee differential privacy (DP) can be amplified and 3) the final performance is susceptible to hyper-parameters. A key factor leading to these phenomena is the discordance between jointly optimizing the two low-rank matrices by local clients and separately aggregating them by the central server.…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

I like the motivation of the FFA-LoRA algorithm, and appreciate the attempt to provide some analysis on the caveats of LoRA. The experiments on two models (RoBERTa and LLaMA) fine-tuning on a subset of GLUE tasks and a GSM-8K language generation task in both non-DP and DP settings show good empirical performance of FFA-LoRA.

Weaknesses

I thank the authors for providing details of the experimental setup. However, the federated learning setting in experiments seems a bit unconventional with a very small number of clients (only 3 clients). This might be categorized as a cross-silo setting, but it would be good to clearly discuss the targeted application (https://arxiv.org/abs/1912.04977 table 1, https://arxiv.org/abs/2107.06917 section 3.1). While I appreciate the motivation of analyzing LoRA in section 3, none of the explanati

Reviewer 02Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

+ The motivation is sound and the paper writing is easy to follow. + Empirical results show competitive performance under different differential privacy and parameter budget. + Empirical results are comprehensive, considering multiple tasks and ablation study.

Weaknesses

+ The motivation is straightforward and intuitive, without theoretical insights.

Reviewer 03Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

1. The study on federated LoRA is timely. 2. The proposed approach is simple to implement. 3. The authors provide case studies to highlight the limitations of the vanilla LoRA and motivate their approach.

Weaknesses

1. The benefit of FFA-LoRA on differential privacy (DP) is not very well backed by empirical evaluation. The performance gap between the vanilla LoRA and the proposed FFA-LoRA remains the same across various privacy budgets $\epsilon$, including $\epsilon = 0$. Such an empirical result suggests that the impact of DP noise is the same on both the vanilla LoRA and the proposed FFA-LoRA. 2. I do not see why the proposed FFA-LoRA is free from tuning the hyper-parameter $\alpha$. In Section 4, the a

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques