Dynamic in Static: Hybrid Visual Correspondence for Self-Supervised   Video Object Segmentation

Gensheng Pei; Yazhou Yao; Jianbo Jiao; Wenguan Wang; Liqiang Nie; and; Jinhui Tang

arXiv:2404.13505·cs.CV·April 23, 2024·1 cites

Dynamic in Static: Hybrid Visual Correspondence for Self-Supervised Video Object Segmentation

Gensheng Pei, Yazhou Yao, Jianbo Jiao, Wenguan Wang, Liqiang Nie, and, Jinhui Tang

PDF

Open Access 1 Repo

TL;DR

This paper introduces HVC, a self-supervised hybrid static-dynamic visual correspondence framework for video object segmentation that efficiently learns from static images, reducing training time and memory while achieving state-of-the-art results.

Contribution

HVC is the first to combine static and dynamic visual correspondence in a self-supervised manner for VOS, requiring only one training session on static images.

Findings

01

Achieves state-of-the-art results on self-supervised VOS benchmarks.

02

Reduces training time to approximately 2 hours and memory to 16GB.

03

Effectively propagates video labels using static image data.

Abstract

Conventional video object segmentation (VOS) methods usually necessitate a substantial volume of pixel-level annotated video data for fully supervised learning. In this paper, we present HVC, a \textbf{h}ybrid static-dynamic \textbf{v}isual \textbf{c}orrespondence framework for self-supervised VOS. HVC extracts pseudo-dynamic signals from static images, enabling an efficient and scalable VOS model. Our approach utilizes a minimalist fully-convolutional architecture to capture static-dynamic visual correspondence in image-cropped views. To achieve this objective, we present a unified self-supervised approach to learn visual representations of static-dynamic feature similarity. Firstly, we establish static correspondence by utilizing a priori coordinate information between cropped views to guide the formation of consistent static feature representations. Subsequently, we devise a concise…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nust-machine-intelligence-laboratory/hvc
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Vision and Imaging

MethodsVOS