MixMask: Revisiting Masking Strategy for Siamese ConvNets
Kirill Vishniakov, Eric Xing, Zhiqiang Shen

TL;DR
MixMask introduces a novel filling-based masking strategy for Siamese ConvNets that improves training efficiency and performance across multiple tasks by replacing erased regions with content from other images and employing an adaptive loss.
Contribution
This work proposes MixMask, a filling-based masking approach with an adaptive loss for Siamese ConvNets, addressing limitations of erase-based masking and enhancing self-supervised learning.
Findings
Outperforms MSCN in various tasks
Enhances linear probing, semi-supervised, and supervised finetuning
Improves object detection and segmentation results
Abstract
The recent progress in self-supervised learning has successfully combined Masked Image Modeling (MIM) with Siamese Networks, harnessing the strengths of both methodologies. Nonetheless, certain challenges persist when integrating conventional erase-based masking within Siamese ConvNets. Two primary concerns are: (1) The continuous data processing nature of ConvNets, which doesn't allow for the exclusion of non-informative masked regions, leading to reduced training efficiency compared to ViT architecture; (2) The misalignment between erase-based masking and the contrastive-based objective, distinguishing it from the MIM technique. To address these challenges, this work introduces a novel filling-based masking approach, termed \textbf{MixMask}. The proposed method replaces erased areas with content from a different image, effectively countering the information depletion seen in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
MethodsMutual Information Machine/Mask Image Modeling
