CHARM: Collaborative Harmonization across Arbitrary Modalities for Modality-agnostic Semantic Segmentation
Lekang Wen, Jing Xiao, Liang Liao, Jiajun Chen, Mi Wang

TL;DR
This paper introduces CHARM, a novel framework for modality-agnostic semantic segmentation that promotes cooperative harmonization of diverse modalities, preserving their strengths and improving performance across various datasets.
Contribution
The paper proposes a new complementary learning framework with implicit alignment and dual-path optimization to enhance cross-modal harmony and performance in semantic segmentation.
Findings
Consistently outperforms baselines across multiple datasets.
Significant improvements on fragile modalities.
Effective preservation of modality-specific advantages.
Abstract
Modality-agnostic Semantic Segmentation (MaSS) aims to achieve robust scene understanding across arbitrary combinations of input modality. Existing methods typically rely on explicit feature alignment to achieve modal homogenization, which dilutes the distinctive strengths of each modality and destroys their inherent complementarity. To achieve cooperative harmonization rather than homogenization, we propose CHARM, a novel complementary learning framework designed to implicitly align content while preserving modality-specific advantages through two components: (1) Mutual Perception Unit (MPU), enabling implicit alignment through window-based cross-modal interaction, where modalities serve as both queries and contexts for each other to discover modality-interactive correspondences; (2) A dual-path optimization strategy that decouples training into Collaborative Learning Strategy (CoL)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Visual Attention and Saliency Detection
