TL;DR
This paper introduces a federated learning framework that uses adaptive, learnable client-side embeddings to reconfigure and align multimodal representations across clients with heterogeneous and incomplete data, improving performance significantly.
Contribution
It proposes a novel federated learning approach with locally adaptive representations and reconfiguration signals to handle diverse missing data patterns across clients.
Findings
Achieves up to 36.45% performance improvement under severe data incompleteness.
Demonstrates effectiveness across multiple federated multimodal benchmarks.
Provides theoretical analysis with explicit performance bounds.
Abstract
Multimodal federated learning in real-world settings often encounters incomplete and heterogeneous data across clients. This results in misaligned local feature representations that limit the effectiveness of model aggregation. Unlike prior work that assumes either differing modality sets without missing input features or a shared modality set with missing features across clients, we consider a more general and realistic setting where each client observes a different subset of modalities and might also have missing input features within each modality. To address the resulting misalignment in learned representations, we propose a new federated learning framework featuring locally adaptive representations based on learnable client-side embedding controls that encode each client's data-missing patterns. These embeddings serve as reconfiguration signals that align the globally aggregated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
