Learning Modality-agnostic Representation for Semantic Segmentation from   Any Modalities

Xu Zheng; Yuanhuiyi Lyu; Lin Wang

arXiv:2407.11351·cs.CV·July 17, 2024

Learning Modality-agnostic Representation for Semantic Segmentation from Any Modalities

Xu Zheng, Yuanhuiyi Lyu, Lin Wang

PDF

Open Access

TL;DR

This paper introduces Any2Seg, a framework that learns modality-agnostic representations for semantic segmentation, enabling robust performance across various modalities and conditions by leveraging knowledge distillation from vision-language models.

Contribution

The paper proposes a novel framework combining language-guided semantic correlation distillation and modality-agnostic feature fusion to improve multi-modal semantic segmentation robustness.

Findings

01

Achieves state-of-the-art results on benchmarks with four modalities.

02

Significantly improves performance in modality-incomplete scenarios.

03

Demonstrates robustness across diverse visual conditions.

Abstract

Image modality is not perfect as it often fails in certain conditions, e.g., night and fast motion. This significantly limits the robustness and versatility of existing multi-modal (i.e., Image+X) semantic segmentation methods when confronting modality absence or failure, as often occurred in real-world applications. Inspired by the open-world learning capability of multi-modal vision-language models (MVLMs), we explore a new direction in learning the modality-agnostic representation via knowledge distillation (KD) from MVLMs. Intuitively, we propose Any2Seg, a novel framework that can achieve robust segmentation from any combination of modalities in any visual conditions. Specifically, we first introduce a novel language-guided semantic correlation distillation (LSCD) module to transfer both inter-modal and intra-modal semantic knowledge in the embedding space from MVLMs, e.g.,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsKnowledge Distillation