Fully and Weakly Supervised Referring Expression Segmentation with   End-to-End Learning

Hui Li; Mingjie Sun; Jimin Xiao; Eng Gee Lim; and Yao Zhao

arXiv:2212.10278·cs.CV·December 21, 2022

Fully and Weakly Supervised Referring Expression Segmentation with End-to-End Learning

Hui Li, Mingjie Sun, Jimin Xiao, Eng Gee Lim, and Yao Zhao

PDF

Open Access

TL;DR

This paper introduces a parallel pipeline for referring expression segmentation that isolates localization and segmentation, enabling effective weakly-supervised training with click annotations, and achieves state-of-the-art results.

Contribution

Proposes a novel parallel position-kernel-segmentation pipeline that improves RES by separating localization and segmentation, and enables weakly-supervised learning with click annotations.

Findings

01

Outperforms previous RES methods on multiple benchmarks.

02

Enables weakly-supervised RES training with click annotations.

03

Achieves significant performance gains in both fully- and weakly-supervised settings.

Abstract

Referring Expression Segmentation (RES), which is aimed at localizing and segmenting the target according to the given language expression, has drawn increasing attention. Existing methods jointly consider the localization and segmentation steps, which rely on the fused visual and linguistic features for both steps. We argue that the conflict between the purpose of identifying an object and generating a mask limits the RES performance. To solve this problem, we propose a parallel position-kernel-segmentation pipeline to better isolate and then interact the localization and segmentation steps. In our pipeline, linguistic information will not directly contaminate the visual feature for segmentation. Specifically, the localization step localizes the target object in the image based on the referring expression, and then the visual kernel obtained from the localization step guides the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Subtitles and Audiovisual Media