Modelling Neighbor Relation in Joint Space-Time Graph for Video   Correspondence Learning

Zixu Zhao; Yueming Jin; Pheng-Ann Heng

arXiv:2109.13499·cs.CV·September 29, 2021

Modelling Neighbor Relation in Joint Space-Time Graph for Video Correspondence Learning

Zixu Zhao, Yueming Jin, Pheng-Ann Heng

PDF

Open Access

TL;DR

This paper introduces a self-supervised approach for learning visual correspondence in videos by modeling neighbor relations in a joint space-time graph, improving performance on various video understanding tasks without requiring labeled data.

Contribution

The method uniquely models neighbor relations in a joint space-time graph and leverages cycle-consistency for self-supervised learning, outperforming existing methods on multiple video tasks.

Findings

01

Outperforms state-of-the-art self-supervised methods on video tasks

02

Surpasses some fully supervised algorithms in accuracy

03

Effective in tasks like object propagation and pose tracking

Abstract

This paper presents a self-supervised method for learning reliable visual correspondence from unlabeled videos. We formulate the correspondence as finding paths in a joint space-time graph, where nodes are grid patches sampled from frames, and are linked by two types of edges: (i) neighbor relations that determine the aggregation strength from intra-frame neighbors in space, and (ii) similarity relations that indicate the transition probability of inter-frame paths across time. Leveraging the cycle-consistency in videos, our contrastive learning objective discriminates dynamic objects from both their neighboring views and temporal views. Compared with prior works, our approach actively explores the neighbor relations of central instances to learn a latent association between center-neighbor pairs (e.g., "hand -- arm") across time, thus improving the instance discrimination. Without…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Video Surveillance and Tracking Methods

MethodsContrastive Learning