Comprehensive Multi-Modal Interactions for Referring Image Segmentation

Kanishk Jain; Vineet Gandhi

arXiv:2104.10412·cs.CV·August 16, 2022

Comprehensive Multi-Modal Interactions for Referring Image Segmentation

Kanishk Jain, Vineet Gandhi

PDF

1 Repo

TL;DR

This paper introduces a novel multi-modal fusion framework for referring image segmentation that performs simultaneous cross-modal and intra-modal interactions, leading to improved segmentation accuracy.

Contribution

It proposes the Synchronous Multi-Modal Fusion Module and Hierarchical Cross-Modal Aggregation Module to enhance interaction modeling and segmentation quality in RIS.

Findings

01

Achieves state-of-the-art performance on four benchmark datasets.

02

Demonstrates the effectiveness of simultaneous interaction modeling.

03

Provides comprehensive ablation studies confirming design choices.

Abstract

We investigate Referring Image Segmentation (RIS), which outputs a segmentation map corresponding to the natural language description. Addressing RIS efficiently requires considering the interactions happening across visual and linguistic modalities and the interactions within each modality. Existing methods are limited because they either compute different forms of interactions sequentially (leading to error propagation) or ignore intramodal interactions. We address this limitation by performing all three interactions simultaneously through a Synchronous Multi-Modal Fusion Module (SFM). Moreover, to produce refined segmentation masks, we propose a novel Hierarchical Cross-Modal Aggregation Module (HCAM), where linguistic features facilitate the exchange of contextual information across the visual hierarchy. We present thorough ablation studies and validate our approach's performance on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kanji95/SHNET
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.