Learning to Generate Text-grounded Mask for Open-world Semantic   Segmentation from Only Image-Text Pairs

Junbum Cha; Jonghwan Mun; Byungseok Roh

arXiv:2212.00785·cs.CV·March 28, 2023

Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs

Junbum Cha, Jonghwan Mun, Byungseok Roh

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel Text-grounded Contrastive Learning framework for open-world semantic segmentation, enabling direct region-text alignment and achieving state-of-the-art zero-shot performance using only image-text pairs.

Contribution

It proposes a new TCL framework that learns region-text alignment directly, addressing train-test discrepancy in open-world segmentation without dense annotations.

Findings

01

Achieves state-of-the-art zero-shot segmentation across 8 datasets.

02

Outperforms existing contrastive learning methods significantly.

03

Provides a unified evaluation protocol for open-world segmentation.

Abstract

We tackle open-world semantic segmentation, which aims at learning to segment arbitrary visual concepts in images, by using only image-text pairs without dense annotations. Existing open-world segmentation methods have shown impressive advances by employing contrastive learning (CL) to learn diverse visual concepts and transferring the learned image-level understanding to the segmentation task. However, these CL-based methods suffer from a train-test discrepancy, since it only considers image-text alignment during training, whereas segmentation requires region-text alignment during testing. In this paper, we proposed a novel Text-grounded Contrastive Learning (TCL) framework that enables a model to directly learn region-text alignment. Our method generates a segmentation mask for a given text, extracts text-grounded image embedding from the masked region, and aligns it with text…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kakaobrain/tcl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsALIGN · Contrastive Learning