Masked Momentum Contrastive Learning for Zero-shot Semantic Understanding
Jiantao Wu, Shentong Mo, Muhammad Awais, Sara Atito and, Zhenhua Feng, Josef Kittler

TL;DR
This paper introduces MMC, a self-supervised learning method that enhances zero-shot semantic segmentation by improving the discriminative power of vision transformers without finetuning.
Contribution
The paper proposes a novel SSP approach called MMC that combines masked image modeling, momentum self-distillation, and global contrast to improve zero-shot segmentation.
Findings
MMC significantly reduces intra- and inter-object similarity overlap.
MMC achieves top-tier zero-shot segmentation results across datasets.
The approach enhances discriminative representations of SSP ViTs.
Abstract
Self-supervised pretraining (SSP) has emerged as a popular technique in machine learning, enabling the extraction of meaningful feature representations without labelled data. In the realm of computer vision, pretrained vision transformers (ViTs) have played a pivotal role in advancing transfer learning. Nonetheless, the escalating cost of finetuning these large models has posed a challenge due to the explosion of model size. This study endeavours to evaluate the effectiveness of pure self-supervised learning (SSL) techniques in computer vision tasks, obviating the need for finetuning, with the intention of emulating human-like capabilities in generalisation and recognition of unseen objects. To this end, we propose an evaluation protocol for zero-shot segmentation based on a prompting patch. Given a point on the target object as a prompt, the algorithm calculates the similarity map…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and ELM
