FedEx-LoRA: Exact Aggregation for Federated and Efficient Fine-Tuning of Foundation Models
Raghav Singhal, Kaustubh Ponkshe, Praneeth Vepakomma

TL;DR
FedEx-LoRA introduces an exact aggregation method for federated LoRA fine-tuning, significantly improving accuracy and efficiency in distributed training of foundation models across various NLP tasks.
Contribution
The paper proposes FedEx-LoRA, a novel method that achieves exact updates in federated LoRA fine-tuning, overcoming the inaccuracy of traditional federated averaging methods.
Findings
Consistent performance improvements over state-of-the-art methods.
Significant reduction in update deviations from the ideal solution.
Broad applicability across multiple NLP tasks.
Abstract
Low-Rank Adaptation (LoRA) is a popular technique for efficient fine-tuning of foundation models. However, applying LoRA in federated learning environments, where data is distributed across multiple clients, presents unique challenges. Existing methods rely on traditional federated averaging of LoRA adapters, resulting in inexact updates. To address this, we propose Federated Exact LoRA, or FedEx-LoRA, which adds a residual error term to the pretrained frozen weight matrix. Our approach achieves exact updates with minimal computational and communication overhead, preserving LoRA's efficiency. We evaluate the method on various models across arithmetic reasoning, commonsense reasoning, natural language understanding and natural language generation tasks, showing consistent performance gains over state-of-the-art methods across multiple settings. Through extensive analysis, we quantify…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
This work provides a method, called FedEx-LoRA, to address the inexact aggregation problem when applying the LoRA in federated learning environment for large language model fine-tuning.
There are some concerns for the proposed methods: 1) for the federated learning, the network bandwidth between the server and clients is often very limited. This work brings lots of extra communication overhead for each training round. The extra communication data between server and clients equals to the entire model size which brings strong communication overhead, especially for the federated learning environments. 2) the proposed methods requires to calculate n matrix multiplication, where
The paper introduces a straightforward yet effective method that ensures exact aggregation in LoRA-based fine tuning for Federated Learning. While it has a slightly higher communication cost compared to FedIT and FFA-LoRA, the increase is minimal given the significant improvement in overall performance. Moreover, the method's effectiveness is validated through comprehensive experiments and analysis, affirming its claim of exact aggregation and subsequently better overall performance. Additiona
The main issue is that the contribution may seem too incremental for ICLR conference as the method primarily focuses on a straightforward adjustment in the aggregation phase—specifically, adding back the discrepancy between the average of the products and the product of the averages. This method, while empirically effective, does not introduce a significant innovation or a novel approach that would typically be expected for ICLR. Additionally, there are practical scalability concerns regarding t
- Originality: The paper introduces a novel solution to the inexact aggregation problem in federated fine-tuning with LoRA by adding a residual error term directly to the pretrained weight matrix. This innovative approach ensures exact updates while preserving the low-rank efficiency of LoRA, addressing a key limitation in existing methods. - Quality: The authors provide thorough theoretical justification and extensive empirical evaluations across multiple benchmarks. The experiments consistent
- Model limitations: The paper evaluates FedEx-LoRA primarily on RoBERTa and GPT-2, which are smaller foundation models. It remains uncertain how this method performs on larger models, such as Llama and Mistral, where scalability challenges might differ. - Task scope: The evaluation focuses on standard NLP tasks. Assessing performance on more complex tasks, such as reasoning and inference, would strengthen the paper’s claims on generalizability and robustness. - Quantization challenges: Introd
(1) The problem raised about whether to use FedAvg on LoRA is valuable. It brings up the problem that the optimization target of LoRA federated fine-tuning is the full global model $W$ or a global LoRA $B_GA_G$. (2) Figure 2 provides some interesting insights for FL researchers. The authors find that the deviations decrease as the model depth increases, which may lead to deeper reflections on the relationship between model architecture and fine-tuning.
**Contribution**: The authors mention in the paper that they identify a critical discrepancy in traditional federated averaging of LoRA adapters. However, this question has been raised and studied in detail. [1] and [2] identified this problem and [2] provided a solution for eliminating this error. None of these previous works are discussed in detail and listed as baselines. The reviewer believes that the author has not adequately researched this field, so the contribution is limited. **Method*
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCryptography and Data Security
