Boosting Video Object Segmentation via Space-time Correspondence   Learning

Yurong Zhang; Liulei Li; Wenguan Wang; Rong Xie; Li Song; Wenjun Zhang

arXiv:2304.06211·cs.CV·April 14, 2023·1 cites

Boosting Video Object Segmentation via Space-time Correspondence Learning

Yurong Zhang, Liulei Li, Wenguan Wang, Rong Xie, Li Song, Wenjun Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a correspondence-aware training framework that enhances video object segmentation by explicitly enforcing robust space-time correspondence matching, leading to significant performance improvements without extra annotation or architectural changes.

Contribution

It proposes a novel training method that incorporates contrastive correspondence learning to improve matching-based VOS models, leveraging intrinsic video coherence without additional annotation.

Findings

01

Achieves performance gains on DAVIS and YouTube-VOS benchmarks.

02

No extra annotation cost or architectural modifications required.

03

Improves robustness of space-time correspondence matching in VOS.

Abstract

Current top-leading solutions for video object segmentation (VOS) typically follow a matching-based regime: for each query frame, the segmentation mask is inferred according to its correspondence to previously processed and the first annotated frames. They simply exploit the supervisory signals from the groundtruth masks for learning mask prediction only, without posing any constraint on the space-time correspondence matching, which, however, is the fundamental building block of such regime. To alleviate this crucial yet commonly ignored issue, we devise a correspondence-aware training framework, which boosts matching-based VOS solutions by explicitly encouraging robust correspondence matching during network learning. Through comprehensively exploring the intrinsic coherence in videos on pixel and object levels, our algorithm reinforces the standard, fully supervised training of mask…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wenguanwang/vos_correspondence
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · VOS