Elastic Mixture of Rank-Wise Experts for Knowledge Reuse in Federated Fine-Tuning
Yebo Wu, Jingguang Li, Zhijiang Guo, Li Li

TL;DR
SmartFed is a resource-efficient federated fine-tuning framework for LLMs that reuses existing knowledge and adaptively allocates expert capacity, significantly improving performance and efficiency on resource-constrained devices.
Contribution
The paper introduces MoRE, a novel mixture of rank-wise experts, and EEQA, an adaptive expert quota allocation method, to enhance federated fine-tuning efficiency and scalability.
Findings
SmartFed outperforms existing methods in multiple benchmarks.
It reduces training time and computational costs.
It effectively reuses existing LoRA modules for new tasks.
Abstract
Federated fine-tuning offers a promising solution for adapting Large Language Models (LLMs) to downstream tasks while safeguarding data privacy. However, its high computational and communication demands hinder its deployment on resource-constrained devices. In this paper, we propose SmartFed, a resource-efficient federated fine-tuning framework. SmartFed intelligently reuses knowledge embedded in existing LoRA modules, eliminating the need for expensive training from scratch when adapting LLMs to new tasks. To effectively exploit this knowledge and ensure scalability, we introduce the Mixture of Rank-Wise Experts (MoRE). MoRE decomposes LoRA modules into fine-grained rank-level experts. These experts are selectively activated and combined based on input semantics and resource budgets. Moreover, to optimize resource utilization, we present the Elastic Expert Quota Allocation (EEQA). EEQA…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
- The paper is well written and well motivated. - Improving the efficiency of federated learning at the edge appears to be a promising direction to explore. - The proposed SmartFed method is presented clearly; it is intuitive and easy to follow. - The experimental results seem reasonable.
- Using PEFT for federated learning is a somewhat crowded research area. Many methods have already been developed, as shown in the paper (with numerous baseline methods compared). Looking at the main results in Table 2, SmartFed appears to outperform other noticeable baselines, but it is difficult to assess the statistical significance of these improvements. For instance, how meaningful is an improvement of about two points on LLaMA2 over some benchmarks? That being said, SmartFed seems to make
This is an interesting paradigm, but it requires more comprehensive evaluation to fully validate its generality and robustness.
1. Figure 4 lacks clarity and insight. The observations in Figure 4 are not intuitive and provide limited insight, as they are only evaluated on one or two layers. 2. On the optimal number of experts. Given the redundancy within LLMs, is there an optimal number of experts? I am skeptical whether the results shown in Figure 5 would generalize across different tasks. 3. Router-only training raises concerns on generalization. In the proposed method, only the router is trained. It is unclear wheth
Creative reuse of LoRA: the router-based composition cleverly exploits LoRA’s rank-wise decomposability to mix skills without retraining full adapters. No from-scratch LoRA training: this slashes compute and communication for federated learning, making deployment far more practical on edge devices. Modular and scalable: sparse rank-wise activation plus adaptive quotas yields strong data efficiency and plug-and-play growth with LoRA libraries, improving latency and energy in real systems.
Heavy reliance on the quality and coverage of existing LoRA modules. If client data are private or domain specific and no matching public LoRA exists, the method may misalign with the goals of federated learning; the experiments do not cover fully private, no-LoRA settings. System and tuning complexity. The router and EEQA require importance scoring, quota allocation, and Top K choices; performance may be sensitive to hyperparameters and client heterogeneity, and the paper offers limited robust
1. Rank-wise expertization of LoRA for reuse (not retraining) is a crisp idea that avoids cross-task interference from naive merging and improves upon coarse MoLE-style routing. 2. Careful ablations show both MoRE and EEQA matter. Efficiency analyses quantify wall-clock, communication, and energy benefits. 3. Demonstrated improvements across models and tasks, including data-efficiency under 10% data regimes, are compelling for realistic federated scenarios.
1. The method presumes relevant, high-quality task LoRAs can be found and that their ranks or placements suit the target task. 2. Practical latency on device for per-token top-K over large expert pools (sum of ranks across many LoRAs) is not deeply profiled. 3. Experiments focus on LLaMA2-7B/13B and Qwen2-7B, which are now relatively dated choices for open LLM backbones.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Mobile Crowdsensing and Crowdsourcing · Big Data and Digital Economy
