FLAME: Towards Federated Fine-Tuning Large Language Models Through Adaptive SMoE
Khiem Le, Tuan Tran, Ting Hua, Nitesh V. Chawla

TL;DR
FLAME introduces a federated learning framework using Sparse Mixture-of-Experts that retains full model information and adapts to client resources, outperforming prior compressed approaches.
Contribution
The paper presents FLAME, a novel SMoE-based federated learning method that maintains full global models and adapts to client resources through expert activation, addressing key challenges with rescaling and aggregation.
Findings
FLAME outperforms existing resource-adaptive federated methods.
It effectively handles output magnitude mismatch and expert imbalance.
Demonstrates robustness across diverse computational settings.
Abstract
Existing resource-adaptive LoRA federated fine-tuning methods enable clients to fine-tune models using compressed versions of global LoRA matrices, in order to accommodate various compute resources across clients. This compression requirement will lead to suboptimal performance due to information loss. To address this, we propose FLAME, a novel federated learning framework based on the Sparse Mixture-of-Experts (SMoE) architecture. Unlike prior approaches, FLAME retains full (uncompressed) global LoRA matrices and achieves client-side adaptability by varying the number of activated experts per client. However, incorporating SMoE into federated learning introduces unique challenges, specifically, the mismatch in output magnitude from partial expert activation and the imbalance in expert training quality across clients. FLAME tackles these challenges through a lightweight rescaling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Mobile Crowdsensing and Crowdsourcing · Stochastic Gradient Optimization Techniques
