Spatial Cross-Attention Improves Self-Supervised Visual Representation   Learning

Mehdi Seyfi; Amin Banitalebi-Dehkordi; and Yong Zhang

arXiv:2206.05028·cs.CV·June 13, 2022

Spatial Cross-Attention Improves Self-Supervised Visual Representation Learning

Mehdi Seyfi, Amin Banitalebi-Dehkordi, and Yong Zhang

PDF

Open Access

TL;DR

This paper introduces a spatial cross-attention module that enhances self-supervised visual learning by capturing intra-class spatial correlations, improving various downstream tasks without altering the trained model.

Contribution

It proposes an add-on module for existing self-supervised methods like SwAV that incorporates spatial cross correlations, boosting performance while remaining inference-compatible.

Findings

01

Improved class activation map detection

02

Higher top-1 classification accuracy

03

Enhanced object detection performance

Abstract

Unsupervised representation learning methods like SwAV are proved to be effective in learning visual semantics of a target dataset. The main idea behind these methods is that different views of a same image represent the same semantics. In this paper, we further introduce an add-on module to facilitate the injection of the knowledge accounting for spatial cross correlations among the samples. This in turn results in distilling intra-class information including feature level locations and cross similarities between same-class instances. The proposed add-on can be added to existing methods such as the SwAV. We can later remove the add-on module for inference without any modification of the learned weights. Through an extensive set of empirical evaluations, we verify that our method yields an improved performance in detecting the class activation maps, top-1 classification accuracy, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques

MethodsLARS · Swapping Assignments between Views