Federated Prompt-Tuning with Heterogeneous and Incomplete Multimodal Client Data

Thu Hang Phung; Duong M. Nguyen; Thanh Trung Huynh; Quoc Viet Hung Nguyen; Trong Nghia Hoang; and Phi Le Nguyen

arXiv:2602.07081·cs.MM·February 10, 2026

Federated Prompt-Tuning with Heterogeneous and Incomplete Multimodal Client Data

Thu Hang Phung, Duong M. Nguyen, Thanh Trung Huynh, Quoc Viet Hung Nguyen, Trong Nghia Hoang, and Phi Le Nguyen

PDF

Open Access

TL;DR

This paper proposes a federated prompt-tuning framework tailored for multi-modal, heterogeneous, and incomplete client data, effectively aligning and aggregating prompts across diverse data distributions to improve multimodal learning.

Contribution

It introduces a novel federated prompt-tuning approach that handles heterogeneous and incomplete multimodal data, bridging federated learning and multi-modal prompt-tuning.

Findings

01

Outperforms state-of-the-art baselines on diverse benchmarks.

02

Effectively aligns prompt instructions across clients and modalities.

03

Handles missing features in multi-modal federated datasets.

Abstract

This paper introduces a generalized federated prompt-tuning framework for practical scenarios where local datasets are multi-modal and exhibit different distributional patterns of missing features at the input level. The proposed framework bridges the gap between federated learning and multi-modal prompt-tuning which have traditionally focused on either uni-modal or centralized data. A key challenge in this setting arises from the lack of semantic alignment between prompt instructions that encode similar distributional patterns of missing data across different clients. To address this, our framework introduces specialized client-tuning and server-aggregation designs that simultaneously optimize, align, and aggregate prompt-tuning instructions across clients and data modalities. This allows prompt instructions to complement one another and be combined effectively. Extensive evaluations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsContext-Aware Activity Recognition Systems · IoT and Edge/Fog Computing · Mobile Crowdsensing and Crowdsourcing