Spatial Semantic Recurrent Mining for Referring Image Segmentation
Jiaxing Yang, Lihe Zhang, Jiayu Sun, Huchuan Lu

TL;DR
This paper introduces Spatial Semantic Recurrent Mining (S²RM), a novel method for Referring Image Segmentation that enhances cross-modality fusion by recurrently correlating semantic features across spatial and contextual dimensions.
Contribution
The paper proposes S²RM, a new spatial semantic recurrent framework, and a Cross-scale Abstract Semantic Guided Decoder (CASG) for improved referent segmentation accuracy.
Findings
Outperforms state-of-the-art on four challenging datasets.
Effectively models global relationships and structured semantics.
Enhances cross-modality feature fusion in RIS.
Abstract
Referring Image Segmentation (RIS) consistently requires language and appearance semantics to more understand each other. The need becomes acute especially under hard situations. To achieve, existing works tend to resort to various trans-representing mechanisms to directly feed forward language semantic along main RGB branch, which however will result in referent distribution weakly-mined in space and non-referent semantic contaminated along channel. In this paper, we propose Spatial Semantic Recurrent Mining (S\textsuperscript{2}RM) to achieve high-quality cross-modality fusion. It follows a working strategy of trilogy: distributing language feature, spatial semantic recurrent coparsing, and parsed-semantic balancing. During fusion, S\textsuperscript{2}RM will first generate a constraint-weak yet distribution-aware language feature, then bundle features of each row and column from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Semantic Web and Ontologies
