Rethinking Random Masking in Self-Distillation on ViT
Jihyeon Seong, Hyunkyung Han

TL;DR
This paper investigates the impact of selective random masking in self-distillation with ViTs, demonstrating that masking only the student's global view improves robustness and attention quality, leading to better downstream results.
Contribution
It introduces an asymmetric masking strategy in DINO, masking only the student's global view to enhance robustness without losing critical information.
Findings
Masking only the student's global view improves robustness.
The approach yields more fine-grained attention maps.
Enhanced downstream performance on mini-ImageNet.
Abstract
Vision Transformers (ViTs) have demonstrated remarkable performance across a wide range of vision tasks. In particular, self-distillation frameworks such as DINO have contributed significantly to these advances. Within such frameworks, random masking is often utilized to improve training efficiency and introduce regularization. However, recent studies have raised concerns that indiscriminate random masking may inadvertently eliminate critical semantic information, motivating the development of more informed masking strategies. In this study, we explore the role of random masking in the self-distillation setting, focusing on the DINO framework. Specifically, we apply random masking exclusively to the student's global view, while preserving the student's local views and the teacher's global view in their original, unmasked forms. This design leverages DINO's multi-view augmentation scheme…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Visual Attention and Saliency Detection
MethodsDense Connections · Layer Normalization · Vision Transformer · Softmax · Attention Is All You Need · self-DIstillation with NO labels
