Cross-Modal Progressive Comprehension for Referring Segmentation

Si Liu; Tianrui Hui; Shaofei Huang; Yunchao Wei; Bo Li; Guanbin Li

arXiv:2105.07175·cs.CV·May 18, 2021·1 cites

Cross-Modal Progressive Comprehension for Referring Segmentation

Si Liu, Tianrui Hui, Shaofei Huang, Yunchao Wei, Bo Li, Guanbin Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces a progressive cross-modal comprehension scheme for referring segmentation, mimicking human reasoning by sequentially focusing on candidate entities and their relations, leading to state-of-the-art results in image and video segmentation.

Contribution

It proposes a novel Cross-Modal Progressive Comprehension (CMPC) framework with modules for images and videos, enhancing feature interaction and reasoning for improved segmentation accuracy.

Findings

01

Achieves new state-of-the-art on four image segmentation benchmarks.

02

Achieves new state-of-the-art on three video segmentation benchmarks.

03

Effectively models human-like progressive reasoning in multimodal understanding.

Abstract

Given a natural language expression and an image/video, the goal of referring segmentation is to produce the pixel-level masks of the entities described by the subject of the expression. Previous approaches tackle this problem by implicit feature interaction and fusion between visual and linguistic modalities in a one-stage manner. However, human tends to solve the referring problem in a progressive manner based on informative words in the expression, i.e., first roughly locating candidate entities and then distinguishing the target one. In this paper, we propose a Cross-Modal Progressive Comprehension (CMPC) scheme to effectively mimic human behaviors and implement it as a CMPC-I (Image) module and a CMPC-V (Video) module to improve referring image and video segmentation models. For image data, our CMPC-I module first employs entity and attribute words to perceive all the related…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

spyflying/CMPC-Refseg
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques