SheafAlign: A Sheaf-theoretic Framework for Decentralized Multimodal Alignment
Abdulmomen Ghalkha, Zhuojun Tian, Chaouki Ben Issaid, and Mehdi Bennis

TL;DR
SheafAlign introduces a sheaf-theoretic, decentralized framework for multimodal alignment that handles non-redundant modalities, improving robustness and reducing communication costs in real-world scenarios.
Contribution
It presents a novel sheaf-theoretic approach for decentralized multimodal alignment that does not rely on mutual redundancy among all modalities.
Findings
Superior zero-shot generalization and cross-modal alignment
Robustness to missing modalities
50% lower communication cost than baselines
Abstract
Conventional multimodal alignment methods assume mutual redundancy across all modalities, an assumption that fails in real-world distributed scenarios. We propose SheafAlign, a sheaf-theoretic framework for decentralized multimodal alignment that replaces single-space alignment with multiple comparison spaces. This approach models pairwise modality relations through sheaf structures and leverages decentralized contrastive learning-based objectives for training. SheafAlign overcomes the limitations of prior methods by not requiring mutual redundancy among all modalities, preserving both shared and unique information. Experiments on multimodal sensing datasets show superior zero-shot generalization, cross-modal alignment, and robustness to missing modalities, with 50\% lower communication cost than state-of-the-art baselines.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robotics and Sensor-Based Localization · Indoor and Outdoor Localization Technologies
