Learning Modality-agnostic Representation for Semantic Segmentation from Any Modalities
Xu Zheng, Yuanhuiyi Lyu, Lin Wang

TL;DR
This paper introduces Any2Seg, a framework that learns modality-agnostic representations for semantic segmentation, enabling robust performance across various modalities and conditions by leveraging knowledge distillation from vision-language models.
Contribution
The paper proposes a novel framework combining language-guided semantic correlation distillation and modality-agnostic feature fusion to improve multi-modal semantic segmentation robustness.
Findings
Achieves state-of-the-art results on benchmarks with four modalities.
Significantly improves performance in modality-incomplete scenarios.
Demonstrates robustness across diverse visual conditions.
Abstract
Image modality is not perfect as it often fails in certain conditions, e.g., night and fast motion. This significantly limits the robustness and versatility of existing multi-modal (i.e., Image+X) semantic segmentation methods when confronting modality absence or failure, as often occurred in real-world applications. Inspired by the open-world learning capability of multi-modal vision-language models (MVLMs), we explore a new direction in learning the modality-agnostic representation via knowledge distillation (KD) from MVLMs. Intuitively, we propose Any2Seg, a novel framework that can achieve robust segmentation from any combination of modalities in any visual conditions. Specifically, we first introduce a novel language-guided semantic correlation distillation (LSCD) module to transfer both inter-modal and intra-modal semantic knowledge in the embedding space from MVLMs, e.g.,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsKnowledge Distillation
