Unsupervised Modality Adaptation with Text-to-Image Diffusion Models for   Semantic Segmentation

Ruihao Xia; Yu Liang; Peng-Tao Jiang; Hao Zhang; Bo Li; Yang Tang; Pan; Zhou

arXiv:2410.21708·cs.CV·October 30, 2024

Unsupervised Modality Adaptation with Text-to-Image Diffusion Models for Semantic Segmentation

Ruihao Xia, Yu Liang, Peng-Tao Jiang, Hao Zhang, Bo Li, Yang Tang, Pan, Zhou

PDF

Open Access 1 Repo

TL;DR

This paper introduces MADM, a novel approach using text-to-image diffusion models for unsupervised multimodal domain adaptation in semantic segmentation, improving label accuracy and feature resolution across various modalities.

Contribution

MADM leverages diffusion models for pseudo-label generation and introduces label palette and latent regression techniques to enhance cross-modality semantic segmentation.

Findings

01

Achieves state-of-the-art results on multiple modality adaptation tasks

02

Improves pseudo-label accuracy with diffusion-based noise stabilization

03

Enhances feature resolution through label palette and latent regression

Abstract

Despite their success, unsupervised domain adaptation methods for semantic segmentation primarily focus on adaptation between image domains and do not utilize other abundant visual modalities like depth, infrared and event. This limitation hinders their performance and restricts their application in real-world multimodal scenarios. To address this issue, we propose Modality Adaptation with text-to-image Diffusion Models (MADM) for semantic segmentation task which utilizes text-to-image diffusion models pre-trained on extensive image-text pairs to enhance the model's cross-modality capabilities. Specifically, MADM comprises two key complementary components to tackle major challenges. First, due to the large modality gap, using one modal data to generate pseudo labels for another modality suffers from a significant drop in accuracy. To address this, MADM designs diffusion-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xiarho/madm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsDiffusion · Focus