Federated Fine-Tuning of Sparsely-Activated Large Language Models on Resource-Constrained Devices

Fahao Chen; Jie Wan; Peng Li; Zhou Su; Dongxiao Yu

arXiv:2508.19078·cs.DC·October 13, 2025

Federated Fine-Tuning of Sparsely-Activated Large Language Models on Resource-Constrained Devices

Fahao Chen, Jie Wan, Peng Li, Zhou Su, Dongxiao Yu

PDF

TL;DR

FLUX enables federated fine-tuning of large, sparsely-activated language models on resource-limited devices by introducing innovative profiling, expert merging, and role assignment techniques, significantly reducing tuning time.

Contribution

The paper presents FLUX, a novel system that addresses the challenges of federated fine-tuning of MoE-based LLMs on resource-constrained devices through three key innovations.

Findings

01

Achieves up to 4.75X speedup in time-to-accuracy

02

Outperforms existing methods on benchmark datasets

03

Effectively balances tuning and non-tuning experts

Abstract

Federated fine-tuning of Mixture-of-Experts (MoE)-based large language models (LLMs) is challenging due to their massive computational requirements and the resource constraints of participants. Existing working attempts to fill this gap through model quantization, computation offloading, or expert pruning. However, they cannot achieve desired performance due to impractical system assumptions and a lack of consideration for MoE-specific characteristics. In this paper, we propose FLUX, a system designed to enable federated fine-tuning of MoE-based LLMs across participants with constrained computing resources (e.g., consumer-grade GPUs), aiming to minimize time-to-accuracy. FLUX introduces three key innovations: (1) quantization-based local profiling to estimate expert activation with minimal overhead, (2) adaptive layer-aware expert merging to reduce resource consumption while preserving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.