Federated Fine-Tuning of Sparsely-Activated Large Language Models on Resource-Constrained Devices
Fahao Chen, Jie Wan, Peng Li, Zhou Su, Dongxiao Yu

TL;DR
FLUX enables federated fine-tuning of large, sparsely-activated language models on resource-limited devices by introducing innovative profiling, expert merging, and role assignment techniques, significantly reducing tuning time.
Contribution
The paper presents FLUX, a novel system that addresses the challenges of federated fine-tuning of MoE-based LLMs on resource-constrained devices through three key innovations.
Findings
Achieves up to 4.75X speedup in time-to-accuracy
Outperforms existing methods on benchmark datasets
Effectively balances tuning and non-tuning experts
Abstract
Federated fine-tuning of Mixture-of-Experts (MoE)-based large language models (LLMs) is challenging due to their massive computational requirements and the resource constraints of participants. Existing working attempts to fill this gap through model quantization, computation offloading, or expert pruning. However, they cannot achieve desired performance due to impractical system assumptions and a lack of consideration for MoE-specific characteristics. In this paper, we propose FLUX, a system designed to enable federated fine-tuning of MoE-based LLMs across participants with constrained computing resources (e.g., consumer-grade GPUs), aiming to minimize time-to-accuracy. FLUX introduces three key innovations: (1) quantization-based local profiling to estimate expert activation with minimal overhead, (2) adaptive layer-aware expert merging to reduce resource consumption while preserving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
