Mask Grounding for Referring Image Segmentation

Yong Xien Chng; Henry Zheng; Yizeng Han; Xuchong Qiu; Gao Huang

arXiv:2312.12198·cs.CV·March 26, 2024·1 cites

Mask Grounding for Referring Image Segmentation

Yong Xien Chng, Henry Zheng, Yizeng Han, Xuchong Qiu, Gao Huang

PDF

Open Access 1 Repo

TL;DR

This paper introduces MagNet, a novel approach for Referring Image Segmentation that employs Mask Grounding and cross-modal alignment to improve fine-grained visual-language correspondence, leading to state-of-the-art results.

Contribution

The paper proposes Mask Grounding as an auxiliary task and a cross-modal alignment module to enhance visual grounding and address modality gaps in RIS.

Findings

01

Significant performance improvements on RefCOCO, RefCOCO+, and G-Ref benchmarks.

02

Effective integration of Mask Grounding with existing RIS methods.

03

Outperforms previous state-of-the-art approaches.

Abstract

Referring Image Segmentation (RIS) is a challenging task that requires an algorithm to segment objects referred by free-form language expressions. Despite significant progress in recent years, most state-of-the-art (SOTA) methods still suffer from considerable language-image modality gap at the pixel and word level. These methods generally 1) rely on sentence-level language features for language-image alignment and 2) lack explicit training supervision for fine-grained visual grounding. Consequently, they exhibit weak object-level correspondence between visual and language features. Without well-grounded features, prior methods struggle to understand complex expressions that require strong reasoning over relationships among multiple objects, especially when dealing with rarely used or ambiguous clauses. To tackle this challenge, we introduce a novel Mask Grounding auxiliary task that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yxchng/mask-grounding
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Natural Language Processing Techniques