Contrastive Video-Language Segmentation

Chen Liang; Yawei Luo; Yu Wu; Yi Yang

arXiv:2109.14131·cs.CV·September 30, 2021·1 cites

Contrastive Video-Language Segmentation

Chen Liang, Yawei Luo, Yu Wu, Yi Yang

PDF

Open Access

TL;DR

This paper introduces a contrastive learning approach for video-language segmentation that explicitly aligns referred objects with language descriptions, improving the distinction of semantically similar objects in videos.

Contribution

It proposes a novel contrastive learning framework with hard instance mining strategies to enhance object-language alignment in video segmentation.

Findings

01

Achieves state-of-the-art results on A2D Sentences and J-HMDB Sentences benchmarks.

02

Demonstrates improved differentiation between semantically similar objects.

03

Qualitative results show more accurate object distinction.

Abstract

We focus on the problem of segmenting a certain object referred by a natural language sentence in video content, at the core of formulating a pinpoint vision-language relation. While existing attempts mainly construct such relation in an implicit way, i.e., grid-level multi-modal feature fusion, it has been proven problematic to distinguish semantically similar objects under this paradigm. In this work, we propose to interwind the visual and linguistic modalities in an explicit way via the contrastive learning objective, which directly aligns the referred object and the language description and separates the unreferred content apart across frames. Moreover, to remedy for the degradation problem, we present two complementary hard instance mining strategies, i.e., Language-relevant Channel Filter and Relative Hard Instance Construction. They encourage the network to exclude…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsContrastive Learning