Spatial Cross-Attention Improves Self-Supervised Visual Representation Learning
Mehdi Seyfi, Amin Banitalebi-Dehkordi, and Yong Zhang

TL;DR
This paper introduces a spatial cross-attention module that enhances self-supervised visual learning by capturing intra-class spatial correlations, improving various downstream tasks without altering the trained model.
Contribution
It proposes an add-on module for existing self-supervised methods like SwAV that incorporates spatial cross correlations, boosting performance while remaining inference-compatible.
Findings
Improved class activation map detection
Higher top-1 classification accuracy
Enhanced object detection performance
Abstract
Unsupervised representation learning methods like SwAV are proved to be effective in learning visual semantics of a target dataset. The main idea behind these methods is that different views of a same image represent the same semantics. In this paper, we further introduce an add-on module to facilitate the injection of the knowledge accounting for spatial cross correlations among the samples. This in turn results in distilling intra-class information including feature level locations and cross similarities between same-class instances. The proposed add-on can be added to existing methods such as the SwAV. We can later remove the add-on module for inference without any modification of the learned weights. Through an extensive set of empirical evaluations, we verify that our method yields an improved performance in detecting the class activation maps, top-1 classification accuracy, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques
MethodsLARS · Swapping Assignments between Views
