MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised   Learning of Motion and Content Features

Adrien Bardes; Jean Ponce; Yann LeCun

arXiv:2307.12698·cs.CV·July 25, 2023·5 cites

MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised Learning of Motion and Content Features

Adrien Bardes, Jean Ponce, Yann LeCun

PDF

Open Access

TL;DR

MC-JEPA is a self-supervised learning framework that jointly learns optical flow and content features using a shared encoder, improving motion-aware representations for images and videos.

Contribution

It unifies optical flow estimation and content feature learning into a single self-supervised architecture, enabling mutual benefits and improved motion-aware representations.

Findings

01

Achieves competitive optical flow estimation performance.

02

Improves downstream semantic segmentation results.

03

Demonstrates mutual benefits of joint learning objectives.

Abstract

Self-supervised learning of visual representations has been focusing on learning content features, which do not capture object motion or location, and focus on identifying and differentiating objects in images and videos. On the other hand, optical flow estimation is a task that does not involve understanding the content of the images on which it is estimated. We unify the two approaches and introduce MC-JEPA, a joint-embedding predictive architecture and self-supervised learning approach to jointly learn optical flow and content features within a shared encoder, demonstrating that the two associated objectives; the optical flow estimation objective and the self-supervised learning objective; benefit from each other and thus learn content features that incorporate motion information. The proposed approach achieves performance on-par with existing unsupervised optical flow benchmarks, as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsFocus