UB-SMoE: Universally Balanced Sparse Mixture-of-Experts for Resource-adaptive Federated Fine-tuning of Foundation Models

Van-Tuan Tran; Hong-Hanh Nguyen-Le; Marco Ruffini; Merim Dzaferagic

arXiv:2605.16690·cs.LG·May 19, 2026

UB-SMoE: Universally Balanced Sparse Mixture-of-Experts for Resource-adaptive Federated Fine-tuning of Foundation Models

Van-Tuan Tran, Hong-Hanh Nguyen-Le, Marco Ruffini, Merim Dzaferagic

PDF

TL;DR

UB-SMoE introduces a novel method for resource-adaptive federated fine-tuning of foundation models, balancing expert utilization and improving efficiency for low-resource clients.

Contribution

It proposes Dynamic Modulated Routing and Universal Pseudo-Gradient to address expert imbalance and non-differentiability in sparse Mixture-of-Experts for federated learning.

Findings

01

Achieves up to 45% computational reduction on low-resource clients.

02

Improves low-resource client performance by 8.7 times.

03

Outperforms existing heterogeneous LoRA-rank methods.

Abstract

Heterogeneous LoRA-rank methods address system heterogeneity in federated fine-tuning of foundation models by assigning client-specific ranks based on computational capabilities. However, these methods achieve only marginal computational savings, as dense feed-forward computations dominate. Sparse Mixture-of-Experts (SMoE) provides a promising alternative through conditional computation, yet we identify that its naive application to heterogeneous federated settings introduces two critical discordances: (i) expert utilization imbalance and (ii) non-differentiability of Top-K routing. Our convergence analysis demonstrates that these discordances lead to degraded convergence, particularly for resource-constrained clients. To address these challenges, we propose Universally Balanced Sparse Mixture-of-Experts (UB-SMoE), which introduces Dynamic Modulated Routing (DMR) to rebalance expert…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.