MoSiC: Optimal-Transport Motion Trajectory for Dense Self-Supervised Learning

Mohammadreza Salehi; Shashanka Venkataramanan; Ioana Simion; Efstratios Gavves; Cees G. M. Snoek; Yuki M Asano

arXiv:2506.08694·cs.CV·July 11, 2025

MoSiC: Optimal-Transport Motion Trajectory for Dense Self-Supervised Learning

Mohammadreza Salehi, Shashanka Venkataramanan, Ioana Simion, Efstratios Gavves, Cees G. M. Snoek, Yuki M Asano

PDF

Open Access 1 Repo

TL;DR

MoSiC introduces a motion-guided self-supervised learning framework that leverages dense motion trajectories and optimal transport to learn consistent spatiotemporal representations in videos, enhancing robustness in dynamic scenes.

Contribution

The paper presents a novel motion-guided clustering approach using optimal transport and long-range point tracks for self-supervised video representation learning.

Findings

01

Achieves 1-6% improvement on six datasets and benchmarks.

02

Enhances robustness in occlusion and dynamic scene scenarios.

03

Utilizes motion trajectories for temporal feature consistency.

Abstract

Dense self-supervised learning has shown great promise for learning pixel- and patch-level representations, but extending it to videos remains challenging due to the complexity of motion dynamics. Existing approaches struggle as they rely on static augmentations that fail under object deformations, occlusions, and camera movement, leading to inconsistent feature learning over time. We propose a motion-guided self-supervised learning framework that clusters dense point tracks to learn spatiotemporally consistent representations. By leveraging an off-the-shelf point tracker, we extract long-range motion trajectories and optimize feature clustering through a momentum-encoder-based optimal transport mechanism. To ensure temporal coherence, we propagate cluster assignments along tracked points, enforcing feature consistency across views despite viewpoint changes. Integrating motion as an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

smsd75/mosic
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Vision and Imaging · Face recognition and analysis