FedMLLM: Federated Fine-tuning MLLM on Multimodal Heterogeneity Data

Binqian Xu; Xiangbo Shu; Haiyang Mei; Guosen Xie; Basura Fernando; and; Jinhui Tang

arXiv:2411.14717·cs.LG·March 11, 2025

FedMLLM: Federated Fine-tuning MLLM on Multimodal Heterogeneity Data

Binqian Xu, Xiangbo Shu, Haiyang Mei, Guosen Xie, Basura Fernando, and, Jinhui Tang

PDF

Open Access 1 Repo

TL;DR

This paper introduces FedMLLM, a federated fine-tuning framework for multimodal large language models that addresses multimodal heterogeneity, providing a benchmark and demonstrating improved performance in privacy-sensitive, heterogeneous data scenarios.

Contribution

The paper presents a new benchmark for federated fine-tuning of MLLMs on heterogeneous multimodal data and proposes a general FedMLLM framework with modality-agnostic strategies.

Findings

01

Benchmark covers diverse multimodal heterogeneity scenarios.

02

FedMLLM improves MLLM performance across multiple datasets.

03

Framework effectively mitigates multimodal heterogeneity challenges.

Abstract

Multimodal Large Language Models (MLLMs) have made significant advancements, demonstrating powerful capabilities in processing and understanding multimodal data. Fine-tuning MLLMs with Federated Learning (FL) allows for expanding the training data scope by including private data sources, thereby enhancing their practical applicability in privacy-sensitive domains. However, current research remains in the early stage, particularly in addressing the \textbf{multimodal heterogeneities} in real-world applications. In this paper, we introduce a benchmark to evaluate the performance of federated fine-tuning of MLLMs across various multimodal heterogeneous scenarios, laying the groundwork for future research in the field. Our benchmark includes two lightweight MLLMs, two downstream tasks, three evaluation metrics, and five datasets across three domains, along with six comparison baselines,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

1xbq1/fedmllm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis