Semantics-Aware Dynamic Localization and Refinement for Referring Image   Segmentation

Zhao Yang; Jiaqi Wang; Yansong Tang; Kai Chen; Hengshuang Zhao; Philip; H.S. Torr

arXiv:2303.06345·cs.CV·March 14, 2023·1 cites

Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation

Zhao Yang, Jiaqi Wang, Yansong Tang, Kai Chen, Hengshuang Zhao, Philip, H.S. Torr

PDF

Open Access 1 Video

TL;DR

This paper introduces a simple, iterative approach for referring image segmentation that progressively refines multi-modal features using a dynamically updated query, improving segmentation quality over existing complex methods.

Contribution

The paper proposes a novel, versatile method that leverages a continuously updated query to enhance multi-modal feature learning for better segmentation results.

Findings

01

Outperforms state-of-the-art on RefCOCO, RefCOCO+, and G-Ref datasets.

02

More versatile and straightforward to integrate than existing methods.

03

Effectively recovers missing object parts and removes extraneous parts through iteration.

Abstract

Referring image segmentation segments an image from a language expression. With the aim of producing high-quality masks, existing methods often adopt iterative learning approaches that rely on RNNs or stacked attention layers to refine vision-language features. Despite their complexity, RNN-based methods are subject to specific encoder choices, while attention-based methods offer limited gains. In this work, we introduce a simple yet effective alternative for progressively learning discriminative multi-modal features. The core idea of our approach is to leverage a continuously updated query as the representation of the target object and at each iteration, strengthen multi-modal features strongly correlated to the query while weakening less related ones. As the query is initialized by language features and successively updated by object features, our algorithm gradually shifts from being…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques