Joint-Embedding Predictive Architecture for Self-Supervised Learning of   Mask Classification Architecture

Dong-Hee Kim; Sungduk Cho; Hyeonwoo Cho; Chanmin Park; Jinyoung Kim,; Won Hwa Kim

arXiv:2407.10733·cs.CV·July 16, 2024

Joint-Embedding Predictive Architecture for Self-Supervised Learning of Mask Classification Architecture

Dong-Hee Kim, Sungduk Cho, Hyeonwoo Cho, Chanmin Park, Jinyoung Kim,, Won Hwa Kim

PDF

Open Access

TL;DR

Mask-JEPA introduces a self-supervised framework combining joint embedding prediction with mask classification architectures, enabling effective universal image segmentation with improved training and adaptability across diverse datasets.

Contribution

The paper presents Mask-JEPA, a novel self-supervised learning method that enhances mask classification architectures for universal image segmentation, addressing training challenges and improving robustness.

Findings

01

Achieves competitive segmentation results on ADE20K, Cityscapes, and COCO.

02

Demonstrates high adaptability and robustness across different training scenarios.

03

Architecture-agnostic design allows seamless integration with various mask classification models.

Abstract

In this work, we introduce Mask-JEPA, a self-supervised learning framework tailored for mask classification architectures (MCA), to overcome the traditional constraints associated with training segmentation models. Mask-JEPA combines a Joint Embedding Predictive Architecture with MCA to adeptly capture intricate semantics and precise object boundaries. Our approach addresses two critical challenges in self-supervised learning: 1) extracting comprehensive representations for universal image segmentation from a pixel decoder, and 2) effectively training the transformer decoder. The use of the transformer decoder as a predictor within the JEPA framework allows proficient training in universal image segmentation tasks. Through rigorous evaluations on datasets such as ADE20K, Cityscapes and COCO, Mask-JEPA demonstrates not only competitive results but also exceptional adaptability and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndustrial Vision Systems and Defect Detection