Collaborative and Efficient Personalization with Mixtures of Adaptors
Abdulla Jasem Almansoori, Samuel Horv\'ath, Martin Tak\'a\v{c}

TL;DR
This paper introduces FLoRAL, a federated learning framework that efficiently personalizes models for clients using low-rank adaptors, improving generalization and robustness in data-scarce scenarios.
Contribution
FLoRAL is a novel parameter-efficient method that personalizes federated models through client-specific mixtures of low-rank adaptors, casting it as a multi-task learning problem.
Findings
FLoRAL outperforms full model mixtures in data-scarce settings.
FLoRAL provides better personalization than locally tuned adaptors.
Theoretical analysis shows improved gradient variance reduction.
Abstract
Heterogenous data is prevalent in real-world federated learning. We propose a parameter-efficient framework, Federated Low-Rank Adaptive Learning (FLoRAL), that allows clients to personalize in groups by mixing between low-rank adaptors, where the mixtures are client-specific. FLoRAL is a model parameterization that casts personalized federated learning as a multi-task learning problem, with weight sharing as an implicit regularizer. It is memory-efficient, as the personalized parameters (i.e., base model + adaptors) are all federated. Our results show that FLoRAL can generalize better than a mixture of full models when data are scarce. It can also consistently personalize better than models with a locally tuned adaptor per client. This demonstrates the benefits of "federated personalization" and its robustness against overfitting. We derive the convergence rates and show theoretically…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
The strength of this paper is its exploration of parameter-efficient personalization in federated learning which is an important topic.
See my comments below: * The paper lacks a clear presentation of the exact problem it aims to solve. In multiple sections—such as the abstract (lines 11-28) and the introduction (lines 51-57, 72-77)—the objectives and approach remain ambiguous. The methodology section does not clearly delineate the specific FL problem at hand or how FLoRAL is a unique solution to this problem. Overall, the paper needs a more clear writing. * Section 3 introduces five different FL setups, but it is unclear which
I commend the authors for their clear analysis of the weight selection strategy for aggregation and their thorough examination of how uncertainties in estimating the vector $\pi$ impact convergence.
1. The experiments do not include all relevant baselines. Specifically, methods based on Mixture of Experts (MoE), such as [1], and shared Lora, such as [2, 3], are also closely related to the proposed approach. 2. The convergence analysis and the connection between FML and MFL problems are established under the assumption of convexity. However, the experiments involving neural networks generally do not meet this convexity assumption. A stronger alignment between the analysis and experiments wou
- This paper has a nice figure illustration. - FLoRAL uses low-rank adaptation(LoRA) to personalize the model for each client. This significantly reduces the number of parameters that need to be stored and transmitted compared to using full models. - Adapters in FLoRAL are learned collaboratively between clients, leveraging information from multiple data sources.
Although many efforts can be witnessed in this paper, we still find that the structure of this paper is hard to follow, and we can see a lack of explanation/motivation behind some techniques: 1) The authors implement the aggregated gradient every H step in the whole time zone T and use the modulo operator to describe what happens at some specific timestep. However, this would overcomplicate the problem rather than simply using ‘local and ‘global rounds as usual. 2) More discussion is needed on t
- Inspired by LoRA, aggregating partial parameters (adaptors) of federated learning models for better personalization is interesting. It is parameter-efficient and allows local models to benefit both from collaborations and the generalization of their data. - The idea is well-motivated and presented clearly in the introduction. The results show that a mixture of adaptors sometimes can beat a mixture of models. - The theoretical analysis is provided. Also, some insights into why aggregating only
- **Related work is not comprehensively compared and discussed**: Firstly, there are many recent works in personalization for federated learning, such as [1,2] ; authors could discuss in the related work section and compare them in the experimental section. Secondly, in related work, authors can discuss how FLoRAL is different from LoRA-related federated learning methods (what are the strengths, differences, etc). - **The experimental section is poorly presented**: The experiment part could be
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCellular Automata and Applications · Evolutionary Algorithms and Applications
MethodsStochastic Gradient Descent · Local SGD
