Disjoint Masking with Joint Distillation for Efficient Masked Image Modeling
Xin Ma, Chang Liu, Chunyu Xie, Long Ye, Yafeng Deng, Xiangyang Ji

TL;DR
This paper introduces Disjoint Masking with Joint Distillation (DMJD), a training scheme for masked image modeling that improves efficiency and performance by utilizing multiple masked views and dual prediction targets, reducing training time.
Contribution
The paper proposes DMJD, a novel training scheme combining disjoint masking and joint distillation to enhance training efficiency and model performance in masked image modeling.
Findings
DMJD trains ViT with half the epochs compared to traditional methods.
DMJD improves linear probing accuracy over ConvMAE by 5.8%.
DMJD achieves superior results on downstream tasks like segmentation and detection.
Abstract
Masked image modeling (MIM) has shown great promise for self-supervised learning (SSL) yet been criticized for learning inefficiency. We believe the insufficient utilization of training signals should be responsible. To alleviate this issue, we introduce a conceptually simple yet learning-efficient MIM training scheme, termed Disjoint Masking with Joint Distillation (DMJD). For disjoint masking (DM), we sequentially sample multiple masked views per image in a mini-batch with the disjoint regulation to raise the usage of tokens for reconstruction in each image while keeping the masking rate of each view. For joint distillation (JD), we adopt a dual branch architecture to respectively predict invisible (masked) and visible (unmasked) tokens with superior learning targets. Rooting in orthogonal perspectives for training efficiency improvement, DM and JD cooperatively accelerate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
MethodsMutual Information Machine/Mask Image Modeling
