Selective Aggregation for Low-Rank Adaptation in Federated Learning

Pengxin Guo; Shuang Zeng; Yanran Wang; Huijie Fan; Feifei Wang,; Liangqiong Qu

arXiv:2410.01463·cs.LG·March 24, 2025

Selective Aggregation for Low-Rank Adaptation in Federated Learning

Pengxin Guo, Shuang Zeng, Yanran Wang, Huijie Fan, Feifei Wang,, Liangqiong Qu

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces FedSA-LoRA, a federated learning method that selectively shares low-rank matrices to improve efficiency and performance, based on the distinct roles of matrices in learning general versus client-specific knowledge.

Contribution

The paper proposes FedSA-LoRA, a novel approach that shares only the A matrices in LoRA during federated learning, and extends this paradigm to other LoRA variants, enhancing efficiency and understanding.

Findings

01

FedSA-LoRA outperforms traditional methods in natural language tasks.

02

Selective sharing of A matrices maintains model performance while reducing communication.

03

The approach generalizes well across different LoRA variants.

Abstract

We investigate LoRA in federated learning through the lens of the asymmetry analysis of the learned $A$ and $B$ matrices. In doing so, we uncover that $A$ matrices are responsible for learning general knowledge, while $B$ matrices focus on capturing client-specific knowledge. Based on this finding, we introduce Federated Share-A Low-Rank Adaptation (FedSA-LoRA), which employs two low-rank trainable matrices $A$ and $B$ to model the weight update, but only $A$ matrices are shared with the server for aggregation. Moreover, we delve into the relationship between the learned $A$ and $B$ matrices in other LoRA variants, such as rsLoRA and VeRA, revealing a consistent pattern. Consequently, we extend our FedSA-LoRA method to these LoRA variants, resulting in FedSA-rsLoRA and FedSA-VeRA. In this way, we establish a general paradigm for integrating LoRA with FL, offering guidance for future…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 5Confidence 4

Strengths

This paper has the interesting motivation and shows the different roles of low-rank matrices $A$ and $B$ in federated fine-tuning.

Weaknesses

I have the following concerns: 1. Lemma 1 is very interesting to me and the verified experiments show that all the matrices $A_i$ ($i$ is the client index) seems to be the same and $B_i$ differs with each other. We know that different client has independent initialization of $A$, but finally with the exactly same $A$ since the similarity of $A$ from clients is $1.0$ in Figure 2. As you mentioned in Figure 3 in Appendix that the learned A matrices are different from the initialized A matrices, s

Reviewer 02Rating 8Confidence 4

Strengths

1. The paper is generally well-motivated and easy to follow. 2. This work introduces the shared-A LoRA technique into the FL framework, making it promising compared to previous work where the down-projection matrix (A matrix) is kept frozen. 3. Comprehensive experimental results are provided.

Weaknesses

1. While this work introduces a shared-A LoRA framework into FL, such a technique has already been introduced in the MoE area [1] under similar motivations. Although the novelty of introducing it into FL should be recognized, the introduction of such a framework cannot be solely credited to this work. 2. On page 20, Figure 3, the authors claim that "A matrices are different from the initialized A matrices, indicating that the A matrices are updated." However, the cosine similarity between the l

Reviewer 03Rating 5Confidence 3

Strengths

(1) The paper's analysis of the distinct roles of matrices A and B clearly demonstrates its insights and motivations. (2) The proposed method improves upon existing solutions and achieves certain performance enhancements.

Weaknesses

(1) One of the significant contributions of the paper is the asymmetric analysis of matrices A and B, concluding that matrix A is responsible for general knowledge and matrix B for domain-specific knowledge. However, this perspective has already been proposed in several works, such as HydraLoRA [1], which also designed an asymmetric LoRA structure based on this concept. The paper should properly cite relevant literature, and these existing findings somewhat diminish the paper's contribution. (2

Code & Models

Repositories

Pengxin-Guo/FedSA-LoRA
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBrain Tumor Detection and Classification · Machine Learning and ELM · Traffic Prediction and Management Techniques

MethodsFocus