Learning Pixel Trajectories with Multiscale Contrastive Random Walks
Zhangxing Bian, Allan Jabri, Alexei A. Efros, Andrew Owens

TL;DR
This paper introduces a multiscale contrastive random walk approach for pixel-level space-time correspondence, unifying various video modeling tasks like optical flow and object tracking under a self-supervised framework.
Contribution
It extends contrastive random walks to dense pixel graphs with a hierarchical, coarse-to-fine search, enabling a unified self-supervised learning method for multiple video tasks.
Findings
Achieves competitive performance on optical flow, keypoint tracking, and segmentation.
Unifies multiple tasks with a single self-supervised model.
Demonstrates effectiveness of multiscale hierarchy in space-time correspondence.
Abstract
A range of video modeling tasks, from optical flow to multiple object tracking, share the same fundamental challenge: establishing space-time correspondence. Yet, approaches that dominate each space differ. We take a step towards bridging this gap by extending the recent contrastive random walk formulation to much denser, pixel-level space-time graphs. The main contribution is introducing hierarchy into the search problem by computing the transition matrix between two frames in a coarse-to-fine manner, forming a multiscale contrastive random walk when extended in time. This establishes a unified technique for self-supervised learning of optical flow, keypoint tracking, and video object segmentation. Experiments demonstrate that, for each of these tasks, the unified model achieves performance competitive with strong self-supervised approaches specific to that task. Project webpage:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Video Surveillance and Tracking Methods · Human Pose and Action Recognition
