FedUMM: A General Framework for Federated Learning with Unified Multimodal Models

Zhaolong Su; Leheng Zhao; Xiaoying Wu; Ziyue Xu; Jindong Wang

arXiv:2601.15390·cs.LG·January 23, 2026

FedUMM: A General Framework for Federated Learning with Unified Multimodal Models

Zhaolong Su, Leheng Zhao, Xiaoying Wu, Ziyue Xu, Jindong Wang

PDF

Open Access

TL;DR

FedUMM introduces a federated learning framework for unified multimodal models that maintains performance while significantly reducing communication costs, enabling privacy-preserving distributed training.

Contribution

This paper proposes FedUMM, a novel federated learning approach for UMMs using parameter-efficient fine-tuning with adapters, addressing privacy and communication challenges.

Findings

01

Achieves competitive performance with centralized training.

02

Reduces communication by over an order of magnitude.

03

Maintains robustness under data heterogeneity.

Abstract

Unified multimodal models (UMMs) are emerging as strong foundation models that can do both generation and understanding tasks in a single architecture. However, they are typically trained in centralized settings where all training and downstream datasets are gathered in a central server, limiting the deployment in privacy-sensitive and geographically distributed scenarios. In this paper, we present FedUMM, a general federated learning framework for UMMs under non-IID multimodal data with low communication cost. Built on NVIDIA FLARE, FedUMM instantiates federation for a BLIP3o backbone via parameter-efficient fine-tuning: clients train lightweight LoRA adapters while freezing the foundation models, and the server aggregates only adapter updates. We evaluate on VQA v2 and the GenEval compositional generation benchmarks under Dirichlet-controlled heterogeneity with up to 16 clients.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Advanced Graph Neural Networks · Big Data and Digital Economy