Global-Local Self-Distillation for Visual Representation Learning
Tim Lebailly, Tinne Tuytelaars

TL;DR
This paper introduces a global-local self-distillation method for visual representation learning that leverages geometric matchings of local features, improving accuracy especially in low-data regimes.
Contribution
It proposes a novel geometric matching approach for local self-distillation, outperforming similarity-based methods and enhancing self-supervised learning efficiency.
Findings
Geometric matchings outperform similarity matchings in low-data regimes.
Similarity matchings are detrimental in low-data settings compared to baseline.
The method improves downstream accuracy in self-supervised visual learning.
Abstract
The downstream accuracy of self-supervised methods is tightly linked to the proxy task solved during training and the quality of the gradients extracted from it. Richer and more meaningful gradients updates are key to allow self-supervised methods to learn better and in a more efficient manner. In a typical self-distillation framework, the representation of two augmented images are enforced to be coherent at the global level. Nonetheless, incorporating local cues in the proxy task can be beneficial and improve the model accuracy on downstream tasks. This leads to a dual objective in which, on the one hand, coherence between global-representations is enforced and on the other, coherence between local-representations is enforced. Unfortunately, an exact correspondence mapping between two sets of local-representations does not exist making the task of matching local-representations from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Global-Local Self-Distillation for Visual Representation Learning· youtube
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques
