Shatter and Gather: Learning Referring Image Segmentation with Text   Supervision

Dongwon Kim; Namyup Kim; Cuiling Lan; Suha Kwak

arXiv:2308.15512·cs.CV·October 25, 2023

Shatter and Gather: Learning Referring Image Segmentation with Text Supervision

Dongwon Kim, Namyup Kim, Cuiling Lan, Suha Kwak

PDF

Open Access 1 Repo

TL;DR

This paper introduces a weakly supervised learning approach for referring image segmentation that uses only text descriptions for training, significantly reducing labeling costs and outperforming existing methods.

Contribution

The authors propose a novel model and loss function for weakly supervised referring image segmentation using text supervision, eliminating the need for manual pixel-level labels.

Findings

01

Outperforms existing methods on four benchmarks

02

Effective with only text descriptions as supervision

03

Outperforms recent open-vocabulary segmentation models

Abstract

Referring image segmentation, the task of segmenting any arbitrary entities described in free-form texts, opens up a variety of vision applications. However, manual labeling of training data for this task is prohibitively costly, leading to lack of labeled data for training. We address this issue by a weakly supervised learning approach using text descriptions of training images as the only source of supervision. To this end, we first present a new model that discovers semantic entities in input image and then combines such entities relevant to text query to predict the mask of the referent. We also present a new loss function that allows the model to be trained without any further supervision. Our method was evaluated on four public benchmarks for referring image segmentation, where it clearly outperformed the existing method for the same task and recent open-vocabulary segmentation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kdwonn/SaG
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques