Joint-task Self-supervised Learning for Temporal Correspondence

Xueting Li; Sifei Liu; Shalini De Mello; Xiaolong Wang; Jan Kautz,; Ming-Hsuan Yang

arXiv:1909.11895·cs.CV·September 27, 2019·53 cites

Joint-task Self-supervised Learning for Temporal Correspondence

Xueting Li, Sifei Liu, Shalini De Mello, Xiaolong Wang, Jan Kautz,, Ming-Hsuan Yang

PDF

Open Access 2 Repos

TL;DR

This paper introduces a self-supervised learning approach that jointly learns dense temporal correspondence in videos by integrating region tracking and pixel-level matching through a shared affinity matrix, outperforming existing methods.

Contribution

It presents a novel joint-task framework that leverages the synergy between region and pixel-level tasks for improved video correspondence learning.

Findings

01

Outperforms state-of-the-art self-supervised methods on multiple tasks

02

Surpasses fully-supervised ResNet-18 features in affinity representation

03

Effective in video-object, part-segmentation, keypoint, and object tracking

Abstract

This paper proposes to learn reliable dense correspondence from videos in a self-supervised manner. Our learning process integrates two highly related tasks: tracking large image regions \emph{and} establishing fine-grained pixel-level associations between consecutive video frames. We exploit the synergy between both tasks through a shared inter-frame affinity matrix, which simultaneously models transitions between video frames at both the region- and pixel-levels. While region-level localization helps reduce ambiguities in fine-grained matching by narrowing down search regions; fine-grained matching provides bottom-up features to facilitate region-level localization. Our method outperforms the state-of-the-art self-supervised methods on a variety of visual correspondence tasks, including video-object and part-segmentation propagation, keypoint tracking, and object tracking. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning