MOCA: Self-supervised Representation Learning by Predicting Masked   Online Codebook Assignments

Spyros Gidaris; Andrei Bursuc; Oriane Simeoni; Antonin Vobecky; Nikos; Komodakis; Matthieu Cord; Patrick P\'erez

arXiv:2307.09361·cs.CV·July 17, 2024

MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments

Spyros Gidaris, Andrei Bursuc, Oriane Simeoni, Antonin Vobecky, Nikos, Komodakis, Matthieu Cord, Patrick P\'erez

PDF

Open Access 1 Repo 1 Video

TL;DR

MOCA is a self-supervised learning method that combines masked image modeling and contrastive learning using high-level features, achieving state-of-the-art results efficiently.

Contribution

It introduces a unified mask-and-predict framework that leverages high-level features, combining two paradigms for improved efficiency and performance.

Findings

01

State-of-the-art results in low-shot settings

02

Training at least 3 times faster than prior methods

03

Effective combination of masked modeling and contrastive learning

Abstract

Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks for very large fully-annotated datasets. Different classes of self-supervised learning offer representations with either good contextual reasoning properties, e.g., using masked image modeling strategies, or invariance to image perturbations, e.g., with contrastive methods. In this work, we propose a single-stage and standalone method, MOCA, which unifies both desired properties using novel mask-and-predict objectives defined with high-level features (instead of pixel-level details). Moreover, we show how to effectively employ both learning paradigms in a synergistic and computation-efficient way. Doing so, we achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols with a training that is at least 3 times faster than prior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

valeoai/moca
pytorchOfficial

Videos

MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Residual Connection · Absolute Position Encodings · Adam · Layer Normalization · Label Smoothing