PID-Guided Partial Alignment for Multimodal Decentralized Federated Learning

Yanhang Shi; Xiaoyu Wang; Houwei Cao; Jian Li; Yong Liu

arXiv:2601.10012·cs.LG·January 16, 2026

PID-Guided Partial Alignment for Multimodal Decentralized Federated Learning

Yanhang Shi, Xiaoyu Wang, Houwei Cao, Jian Li, Yong Liu

PDF

Open Access 4 Reviews

TL;DR

This paper introduces PARSE, a novel partial alignment framework for multimodal decentralized federated learning that enables heterogeneous agents to share only semantically relevant features, overcoming gradient conflicts without central coordination.

Contribution

PARSE applies partial information decomposition to facilitate slice-level semantic sharing among heterogeneous agents in decentralized federated learning, eliminating the need for gradient surgery.

Findings

01

Consistent performance improvements over baseline methods across various benchmarks.

02

Effective resolution of gradient conflicts between uni- and multimodal agents.

03

Robustness demonstrated through ablations and qualitative visualizations.

Abstract

Multimodal decentralized federated learning (DFL) is challenging because agents differ in available modalities and model architectures, yet must collaborate over peer-to-peer (P2P) networks without a central coordinator. Standard multimodal pipelines learn a single shared embedding across all modalities. In DFL, such a monolithic representation induces gradient misalignment between uni- and multimodal agents; as a result, it suppresses heterogeneous sharing and cross-modal interaction. We present PARSE, a multimodal DFL framework that operationalizes partial information decomposition (PID) in a server-free setting. Each agent performs feature fission to factorize its latent representation into redundant, unique, and synergistic slices. P2P knowledge sharing among heterogeneous agents is enabled by slice-level partial alignment: only semantically shareable branches are exchanged among…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 6Confidence 4

Strengths

（1）Conceptual clarity and intuitiveness: The proposed PID-based feature decomposition combined with partial alignment is conceptually simple yet elegant. It provides a clear and interpretable mechanism for handling cross-modal heterogeneity in decentralized settings. （2）Strong presentation and experimental design: The paper is well-written and well-structured, with comprehensive experiments and clear visualizations. The figures and tables effectively illustrate the advantages of the method, and

Weaknesses

（1）The method assumes that all agents solve the same underlying task (i.e., share the same label space). However, in many realistic multimodal decentralized scenarios, agents may work on related but distinct tasks. How would PARSE handle task heterogeneity? Would the PID-based decomposition and partial alignment still maintain consistent feature semantics across agents? （2）The key designs—PID-based feature fission and slice-level alignment—could, in principle, also benefit centralized or server-

Reviewer 02Rating 2Confidence 4

Strengths

* Good problem statement and clear motivation * Good writing, easy to follow

Weaknesses

* Novelty: The concept of modality decomposition, including PID-based variants, has been explored in prior work [1,2]. The authors should clearly articulate how PARSE advances beyond existing approaches and specify its distinctive contributions. * Literature Review: The literature review should be broadened to encompass centralized multimodal learning methods or federated multimodal learning [1,2], not solely multimodal DFL. The authors are encouraged to discuss the challenges of applying centra

Reviewer 03Rating 4Confidence 4

Strengths

1. PARSE employs a novel approach to knowledge sharing by partitioning data features into three slices. This method is quite intriguing, as it successfully facilitates knowledge sharing and transmission through the alignment of these slices. 2. The research focuses on the issue of modal heterogeneity among agents, a relatively new field that provides significant impetus for the advancement of multimodal federated learning.

Weaknesses

1. This paper employs Partial Information Decomposition to partition features into three slices. Is there a theoretical explanation supporting its effectiveness in the multimodal domain? How is the specific feature partitioning process conducted? 2. The article mentions achieving knowledge sharing through feature fission, yet the specific design involves aggregating modules from the same modality model. For example, the optimization directions for single-modal and multi-modal clients sharing the

Reviewer 04Rating 2Confidence 5

Strengths

This paper uses information decomposition to address modality heterogeneity among clients, which is an interesting and intuitive approach. The experimental section appears clear, including various datasets, modality heterogeneity settings, class heterogeneity settings, different topologies, ablation studies, etc.

Weaknesses

1. In my opinion, the biggest flaw of the proposed method is that it is impractical, in other words, no one would use this method to train models in the real world. The reason is that the method proposed in this paper modifies the model structure rather than being a pure FL algorithm. In contrast, almost all baselines compared in the paper are FL algorithms that are general to all models. This leads to the following consequences: - It cannot be adapted to any currently popular models without

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Advanced Graph Neural Networks · Domain Adaptation and Few-Shot Learning