Modelling Neighbor Relation in Joint Space-Time Graph for Video Correspondence Learning
Zixu Zhao, Yueming Jin, Pheng-Ann Heng

TL;DR
This paper introduces a self-supervised approach for learning visual correspondence in videos by modeling neighbor relations in a joint space-time graph, improving performance on various video understanding tasks without requiring labeled data.
Contribution
The method uniquely models neighbor relations in a joint space-time graph and leverages cycle-consistency for self-supervised learning, outperforming existing methods on multiple video tasks.
Findings
Outperforms state-of-the-art self-supervised methods on video tasks
Surpasses some fully supervised algorithms in accuracy
Effective in tasks like object propagation and pose tracking
Abstract
This paper presents a self-supervised method for learning reliable visual correspondence from unlabeled videos. We formulate the correspondence as finding paths in a joint space-time graph, where nodes are grid patches sampled from frames, and are linked by two types of edges: (i) neighbor relations that determine the aggregation strength from intra-frame neighbors in space, and (ii) similarity relations that indicate the transition probability of inter-frame paths across time. Leveraging the cycle-consistency in videos, our contrastive learning objective discriminates dynamic objects from both their neighboring views and temporal views. Compared with prior works, our approach actively explores the neighbor relations of central instances to learn a latent association between center-neighbor pairs (e.g., "hand -- arm") across time, thus improving the instance discrimination. Without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Video Surveillance and Tracking Methods
MethodsContrastive Learning
