Aggregation Alignment for Federated Learning with Mixture-of-Experts under Data Heterogeneity

Zihan Fang; Qianru Wang; Haonan An; Zheng Lin; Yiqin Deng; Xianhao Chen; Yuguang Fang

arXiv:2603.21276·cs.LG·March 24, 2026

Aggregation Alignment for Federated Learning with Mixture-of-Experts under Data Heterogeneity

Zihan Fang, Qianru Wang, Haonan An, Zheng Lin, Yiqin Deng, Xianhao Chen, Yuguang Fang

PDF

Open Access

TL;DR

This paper introduces FedAlign-MoE, a federated learning framework that aligns routing and expert semantics in Mixture-of-Experts models, enabling effective training across heterogeneous, privacy-sensitive data sources.

Contribution

The paper proposes a novel aggregation method for MoE-based federated learning that maintains expert specialization and routing consistency across clients with diverse data distributions.

Findings

01

FedAlign-MoE achieves faster convergence in non-IID settings.

02

It outperforms existing methods in accuracy and stability.

03

The framework effectively preserves expert roles across clients.

Abstract

Large language models (LLMs) increasingly adopt Mixture-of-Experts (MoE) architectures to scale model capacity while reducing computation. Fine-tuning these MoE-based LLMs often requires access to distributed and privacy-sensitive data, making centralized fine-tuning impractical. Federated learning (FL) therefore provides a paradigm to collaboratively fine-tune MoE-based LLMs, enabling each client to integrate diverse knowledge without compromising data privacy. However, the integration of MoE-based LLM fine-tuning into FL encounters two critical aggregation challenges due to inherent data heterogeneity across clients: (i) divergent local data distributions drive clients to develop distinct gating preference for localized expert selection, causing direct parameter aggregation to produce a ``one-size-fits-none'' global gating network, and (ii) same-indexed experts develop disparate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Mobile Crowdsensing and Crowdsourcing · Domain Adaptation and Few-Shot Learning