Fed MobiLLM: Efficient Federated LLM Fine-Tuning over Heterogeneous Mobile Devices via Server Assisted Side-Tuning
Xingke Yang, Liang Li, Sicong Li, Liwei Guan, Hao Wang, Xiaoqi Qi, Jiang Liu, Xin Fu, Miao Pan

TL;DR
Fed MobiLLM enables efficient federated fine-tuning of large language models on diverse mobile devices by using server-assisted side-tuning, significantly reducing on-device computation, communication costs, and improving convergence speed.
Contribution
The paper introduces a novel server-assisted federated side-tuning approach for LLM fine-tuning on heterogeneous mobile devices, addressing computational and communication challenges.
Findings
Achieves over 95% reduction in computation overhead.
Reduces communication costs by over 93%.
Faster convergence by 5.1 times compared to existing methods.
Abstract
Collaboratively fine-tuning (FT) large language models (LLMs) over heterogeneous mobile devices fosters immense potential applications of personalized intelligence. However, such a vision faces critical system challenges. Conventional federated LLM FT approaches place prohibitive computational and memory burdens on mobile hardware, and their synchronous model aggregation protocols stall for slower devices. In this paper, we propose Fed MobiLLM, a novel design to facilitate efficient federated LLM FT across mobile devices with diverse computing/communication speeds and local model architectures. In particular, Fed MobiLLM implements a pioneering server-assisted federated side-tuning paradigm. Briefly, mobile devices perform lightweight forward propagation computations on local data using their frozen pre-scaled backbone LLMs, and then upload selected intermediate activations. The server…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
1. This paper targets an important and timely problem. Enabling efficient federated fine-tuning of LLMs on resource-constrained edge devices is a critical and emerging challenge for distributed AI systems. 2. The proposed server-assisted side-tuning framework is conceptually novel. By decoupling client-side forward computation from server-side backpropagation, the design effectively addresses both device heterogeneity (computation/memory) and model heterogeneity (different backbone architecture
1. The motivation is present but not sufficiently compelling due to a lack of detailed descriptions and quantitative evidence. For instance, while the authors mention the memory overhead of large-scale models (line 50–52), they do not specify whether quantization techniques or bit-width configurations were used, which are crucial in real-world edge deployment. In addition, the motivation (Section 3) mainly discusses theoretical limitations of existing methods but lacks quantitative or visual com
- This work investigates how to train LLMs on federated learning more efficiently which is indeed a relevant topic to look into since devices are bottlenecked by their system capacity. - The work is written clearly and reads well. - The work has evaluated FedMobiLLM in many different aspects including reduction in on-device memory usage, computation cost, communication overhead, and convergence with different heterogeneity settings. - The work also particularly gives care on the reproducibility
- The paper is unclear on how the asynchronous updates actually differ from the standard vanilla FedAvg algorithm mathematically, and its guarantee of convergence. As the exact update rule for each client & server is not really laid out in the paper I'm unclear how this can affect the standard convergence of the algorithm of FedAvg. For instance, what if the global model that is frozen on the clients' device is stale, but the global model keeps updating the model with the local updates it receiv
The problem studied in this work is very interesting. Federated fine-tuning of LLM among mobile devices faces the memory and computation challenges. Addressing these issues is definitely important. This work is well written and the system diagram looks cool, which helps me to understand this work.
1. The proposed methodology seems to contradict the fundamental motivation of federated learning. The key idea in this work is to transmit intermediate activations from clients to the server, which then performs all parameter updates. Conceptually, this is nearly equivalent to uploading the raw data to a central server and performing centralized training. In the homogeneous model setting, following the authors’ logic, an even simpler and more efficient alternative would be to send the activation
1. Comprehensive Evaluation under Heterogeneity: The paper evaluates Fed MobiLLM under both device heterogeneity (e.g., varying model capacities) and data heterogeneity (non-IID distributions), demonstrating attention to realistic federated scenarios that many existing works overlook. 2. Relevance: The topic of reducing on-device resource consumption in federated learning is highly relevant, especially given the growing interest in deploying large models on edge or mobile devices. 3. Clarity
1. Weak Motivation and Questionable Privacy Rationale: The central motivation (delegating only the backward pass to the server) needs stronger justification. If privacy of the data is the primary concern, transmitting activations and labels to the server already exposes rich representational information that can be inverted to approximate original data (as shown in prior work on gradient and activation leakage attacks [1, 2]). Thus, the privacy gain from not sending raw data is limited. The auth
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Ferroelectric and Negative Capacitance Devices
