DMT-JEPA: Discriminative Masked Targets for Joint-Embedding Predictive   Architecture

Shentong Mo; Sukmin Yun

arXiv:2405.17995·cs.CV·May 29, 2024

DMT-JEPA: Discriminative Masked Targets for Joint-Embedding Predictive Architecture

Shentong Mo, Sukmin Yun

PDF

Open Access 1 Repo

TL;DR

DMT-JEPA enhances joint-embedding predictive architecture by introducing discriminative masked targets based on neighboring patches, significantly improving local semantic understanding and performance across multiple visual tasks.

Contribution

We propose DMT-JEPA, a novel masked modeling objective that generates discriminative latent targets from neighboring patches to improve local semantic understanding.

Findings

01

Improves performance on ImageNet-1K classification

02

Enhances semantic segmentation accuracy on ADE20K

03

Boosts object detection results on COCO

Abstract

The joint-embedding predictive architecture (JEPA) recently has shown impressive results in extracting visual representations from unlabeled imagery under a masking strategy. However, we reveal its disadvantages, notably its insufficient understanding of local semantics. This deficiency originates from masked modeling in the embedding space, resulting in a reduction of discriminative power and can even lead to the neglect of critical local semantics. To bridge this gap, we introduce DMT-JEPA, a novel masked modeling objective rooted in JEPA, specifically designed to generate discriminative latent targets from neighboring information. Our key idea is simple: we consider a set of semantically similar neighboring patches as a target of a masked patch. To be specific, the proposed DMT-JEPA (a) computes feature similarities between each masked patch and its corresponding neighboring patches…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dmtjepa/dmtjepa
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications

MethodsSparse Evolutionary Training