Fully Aligned Network for Referring Image Segmentation

Yong Liu; Ruihao Xu; Yansong Tang

arXiv:2409.19569·cs.CV·October 1, 2024

Fully Aligned Network for Referring Image Segmentation

Yong Liu, Ruihao Xu, Yansong Tang

PDF

Open Access

TL;DR

This paper introduces a Fully Aligned Network (FAN) for Referring Image Segmentation that emphasizes explicit cross-modal interaction principles, achieving state-of-the-art results with a simple architecture.

Contribution

The paper proposes a novel Fully Aligned Network that follows four interaction principles, improving cross-modal comprehension and segmentation accuracy in RIS tasks.

Findings

01

Achieves state-of-the-art performance on RefCOCO, RefCOCO+, and G-Ref benchmarks.

02

Demonstrates the effectiveness of explicit interaction principles in RIS.

03

Uses a simple yet effective architecture for improved segmentation.

Abstract

This paper focuses on the Referring Image Segmentation (RIS) task, which aims to segment objects from an image based on a given language description. The critical problem of RIS is achieving fine-grained alignment between different modalities to recognize and segment the target object. Recent advances using the attention mechanism for cross-modal interaction have achieved excellent progress. However, current methods tend to lack explicit principles of interaction design as guidelines, leading to inadequate cross-modal comprehension. Additionally, most previous works use a single-modal mask decoder for prediction, losing the advantage of full cross-modal alignment. To address these challenges, we present a Fully Aligned Network (FAN) that follows four cross-modal interaction principles. Under the guidance of reasonable rules, our FAN achieves state-of-the-art performance on the prevalent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques · AI in cancer detection · Advanced Image and Video Retrieval Techniques

MethodsSoftmax · Attention Is All You Need