Fully Aligned Network for Referring Image Segmentation
Yong Liu, Ruihao Xu, Yansong Tang

TL;DR
This paper introduces a Fully Aligned Network (FAN) for Referring Image Segmentation that emphasizes explicit cross-modal interaction principles, achieving state-of-the-art results with a simple architecture.
Contribution
The paper proposes a novel Fully Aligned Network that follows four interaction principles, improving cross-modal comprehension and segmentation accuracy in RIS tasks.
Findings
Achieves state-of-the-art performance on RefCOCO, RefCOCO+, and G-Ref benchmarks.
Demonstrates the effectiveness of explicit interaction principles in RIS.
Uses a simple yet effective architecture for improved segmentation.
Abstract
This paper focuses on the Referring Image Segmentation (RIS) task, which aims to segment objects from an image based on a given language description. The critical problem of RIS is achieving fine-grained alignment between different modalities to recognize and segment the target object. Recent advances using the attention mechanism for cross-modal interaction have achieved excellent progress. However, current methods tend to lack explicit principles of interaction design as guidelines, leading to inadequate cross-modal comprehension. Additionally, most previous works use a single-modal mask decoder for prediction, losing the advantage of full cross-modal alignment. To address these challenges, we present a Fully Aligned Network (FAN) that follows four cross-modal interaction principles. Under the guidance of reasonable rules, our FAN achieves state-of-the-art performance on the prevalent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · AI in cancer detection · Advanced Image and Video Retrieval Techniques
MethodsSoftmax · Attention Is All You Need
