Beyond One-to-One: Rethinking the Referring Image Segmentation

Yutao Hu; Qixiong Wang; Wenqi Shao; Enze Xie; Zhenguo Li; Jungong Han,; Ping Luo

arXiv:2308.13853·cs.CV·August 29, 2023·1 cites

Beyond One-to-One: Rethinking the Referring Image Segmentation

Yutao Hu, Qixiong Wang, Wenqi Shao, Enze Xie, Zhenguo Li, Jungong Han,, Ping Luo

PDF

Open Access 1 Repo

TL;DR

This paper introduces a Dual Multi-Modal Interaction network for referring image segmentation that handles complex expressions referring to multiple or no objects, supported by a new challenging dataset, Ref-ZOM.

Contribution

The paper proposes a novel DMMI network with dual decoders for improved segmentation and introduces the Ref-ZOM dataset for more realistic evaluation.

Findings

01

Achieves state-of-the-art performance on multiple datasets.

02

Performs well on various types of text inputs.

03

Demonstrates robustness in complex referring expressions.

Abstract

Referring image segmentation aims to segment the target object referred by a natural language expression. However, previous methods rely on the strong assumption that one sentence must describe one target in the image, which is often not the case in real-world applications. As a result, such methods fail when the expressions refer to either no objects or multiple objects. In this paper, we address this issue from two perspectives. First, we propose a Dual Multi-Modal Interaction (DMMI) Network, which contains two decoder branches and enables information flow in two directions. In the text-to-image decoder, text embedding is utilized to query the visual feature and localize the corresponding target. Meanwhile, the image-to-text decoder is implemented to reconstruct the erased entity-phrase conditioned on the visual feature. In this way, visual features are encouraged to contain the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

toggle1995/ris-dmmi
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques

Methodsfail