FLAME: Towards Federated Fine-Tuning Large Language Models Through Adaptive SMoE

Khiem Le; Tuan Tran; Ting Hua; Nitesh V. Chawla

arXiv:2506.16600·cs.LG·July 16, 2025

FLAME: Towards Federated Fine-Tuning Large Language Models Through Adaptive SMoE

Khiem Le, Tuan Tran, Ting Hua, Nitesh V. Chawla

PDF

Open Access

TL;DR

FLAME introduces a federated learning framework using Sparse Mixture-of-Experts that retains full model information and adapts to client resources, outperforming prior compressed approaches.

Contribution

The paper presents FLAME, a novel SMoE-based federated learning method that maintains full global models and adapts to client resources through expert activation, addressing key challenges with rescaling and aggregation.

Findings

01

FLAME outperforms existing resource-adaptive federated methods.

02

It effectively handles output magnitude mismatch and expert imbalance.

03

Demonstrates robustness across diverse computational settings.

Abstract

Existing resource-adaptive LoRA federated fine-tuning methods enable clients to fine-tune models using compressed versions of global LoRA matrices, in order to accommodate various compute resources across clients. This compression requirement will lead to suboptimal performance due to information loss. To address this, we propose FLAME, a novel federated learning framework based on the Sparse Mixture-of-Experts (SMoE) architecture. Unlike prior approaches, FLAME retains full (uncompressed) global LoRA matrices and achieves client-side adaptability by varying the number of activated experts per client. However, incorporating SMoE into federated learning introduces unique challenges, specifically, the mismatch in output magnitude from partial expert activation and the imbalance in expert training quality across clients. FLAME tackles these challenges through a lightweight rescaling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Mobile Crowdsensing and Crowdsourcing · Stochastic Gradient Optimization Techniques