FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models
Yao Zhang, Hewei Gao, Haokun Chen, Weiguo Li, Yunpu Ma, Volker Tresp

TL;DR
FedNano introduces a federated learning framework for multimodal large language models that centralizes the large model on the server and uses lightweight client modules, significantly reducing client resource requirements and communication costs.
Contribution
The paper presents FedNano, a novel FL framework that enables training large multimodal models without deploying full models on clients, using NanoEdge modules for efficient client adaptation.
Findings
FedNano reduces client storage by 95%.
Communication overhead is limited to 0.01% of model parameters.
Outperforms prior federated learning baselines for MLLMs.
Abstract
Multimodal Large Language Models (MLLMs) excel in tasks like multimodal reasoning and cross-modal retrieval but face deployment challenges in real-world scenarios due to distributed multimodal data and strict privacy requirements. Federated Learning (FL) offers a solution by enabling collaborative model training without centralizing data. However, realizing FL for MLLMs presents significant challenges, including high computational demands, limited client capacity, substantial communication costs, and heterogeneous client data. Existing FL methods assume client-side deployment of full models, an assumption that breaks down for large-scale MLLMs due to their massive size and communication demands. To address these limitations, we propose FedNano, the first FL framework that centralizes the LLM on the server while introducing NanoEdge, a lightweight module for client-specific adaptation.…
Peer Reviews
Decision·Submitted to ICLR 2026
1. This paper proposes the first FL paradigm that avoids placing LLMs on clients while enabling collaborative tuning for MLLMs, which is both practically motivated and technically interesting. 2. This work quantitatively reduces client parameters and communication volume by 95%+ and 99%+, respectively. The improvements are clearly reported. 3. Despite limited trainable parameters, FedNano achieves higher or comparable accuracy to heavier baselines across datasets, backbones, and client numbers.
1. Although this work has a clear modular architecture (LLM on server + NanoEdge on clients). Key design choices such as the exact placement of NanoAdapters, adapter architecture (rank, dimension per modality), and training pipeline (forward/backward interaction with server LLM) remain insufficiently described for full reproducibility. 2. The Fisher-guided merging is claimed as a core innovation, however, there is no explicit ablation of “FedNano w/o Fisher” vs “with Fisher”, making it unclear h
The paper tackles an important and timely problem—enabling scalable FL for large multimodal models under resource constraints.
1.While the paper addresses an important problem, its novelty claim is somewhat overstated. The idea of centralizing the LLM on the server while training lightweight client modules is not new—for instance, MLLM-LLaVA-FL[ref1] already adopts a similar design. The paper should clarify how its approach differs from such prior work. 2. Since the client-side adapters are trained locally without interacting with or being guided by the LLM’s frozen weights, it is unclear whether the learned representa
1. A new FL architecture for MLLMs that keeps the LLM frozen on the server and lets clients adapt via a lightweight “NanoEdge,” cutting client storage by >95%. 2. Communication-efficient adaptation using low-rank NanoAdapters, reducing transmitted parameters > 99%.
1. The experiments results looks weird to the reviewer, why it looks random as you change the number of clients. 2. The paper is basically combining several techniques which leads a lack of novelty. But the final pipeline is practical so this does not look like a big issue to the reviewer.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
