FFT-MoE: Efficient Federated Fine-Tuning for Foundation Models via Large-scale Sparse MoE under Heterogeneous Edge

Gang Hu; Yinglei Teng; Pengfei Wu; and Nan Wang

arXiv:2508.18663·cs.LG·August 27, 2025

FFT-MoE: Efficient Federated Fine-Tuning for Foundation Models via Large-scale Sparse MoE under Heterogeneous Edge

Gang Hu, Yinglei Teng, Pengfei Wu, and Nan Wang

PDF

TL;DR

FFT-MoE introduces a sparse Mixture of Experts approach for federated fine-tuning of foundation models, enhancing adaptability, efficiency, and performance in heterogeneous edge environments without compromising privacy.

Contribution

The paper proposes replacing LoRA with sparse MoE adapters in federated fine-tuning, enabling personalized, resource-aware model adaptation and improved convergence in heterogeneous settings.

Findings

01

Outperforms existing FFT methods in generalization accuracy.

02

Achieves higher training efficiency across diverse data distributions.

03

Ensures balanced expert utilization through heterogeneity-aware regularization.

Abstract

As FMs drive progress toward Artificial General Intelligence (AGI), fine-tuning them under privacy and resource constraints has become increasingly critical particularly when highquality training data resides on distributed edge devices. Federated Learning (FL) offers a compelling solution through Federated Fine-Tuning (FFT), which enables collaborative model adaptation without sharing raw data. Recent approaches incorporate Parameter-Efficient Fine-Tuning (PEFT) techniques such as Low Rank Adaptation (LoRA) to reduce computational overhead. However, LoRA-based FFT faces two major limitations in heterogeneous FL environments: structural incompatibility across clients with varying LoRA configurations and limited adaptability to non-IID data distributions, which hinders convergence and generalization. To address these challenges, we propose FFT MoE, a novel FFT framework that replaces…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.