CrossWeaver: Cross-modal Weaving for Arbitrary-Modality Semantic Segmentation
Zelin Zhang, Kedi Li, Huiqi Liang, Tao Zhang, Chuanzhi Xu

TL;DR
CrossWeaver introduces a flexible multimodal fusion framework with a Modality Interaction Block and Seam-Aligned Fusion, achieving state-of-the-art results in arbitrary-modality semantic segmentation.
Contribution
It presents a novel, simple fusion framework that enhances cross-modal interaction and generalizes well to unseen modality combinations.
Findings
Achieves state-of-the-art performance on multiple benchmarks.
Uses minimal additional parameters compared to existing methods.
Demonstrates strong generalization to unseen modality combinations.
Abstract
Multimodal semantic segmentation has shown great potential in leveraging complementary information across diverse sensing modalities. However, existing approaches often rely on carefully designed fusion strategies that either use modality-specific adaptations or rely on loosely coupled interactions, thereby limiting flexibility and resulting in less effective cross-modal coordination. Moreover, these methods often struggle to balance efficient information exchange with preserving the unique characteristics of each modality across different modality combinations. To address these challenges, we propose CrossWeaver, a simple yet effective multimodal fusion framework for arbitrary-modality semantic segmentation. Its core is a Modality Interaction Block (MIB), which enables selective and reliability-aware cross-modal interaction within the encoder, while a lightweight Seam-Aligned Fusion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
