CrossTransformers: spatially-aware few-shot transfer
Carl Doersch, Ankush Gupta, Andrew Zisserman

TL;DR
This paper introduces CrossTransformers, a novel Transformer-based architecture combined with self-supervised learning to improve few-shot transfer in vision systems, addressing supervision collapse and enhancing robustness across tasks and domains.
Contribution
The paper proposes CrossTransformers and a self-supervised training approach to mitigate supervision collapse, enabling more effective few-shot transfer in vision models.
Findings
Achieved state-of-the-art results on Meta-Dataset for transfer learning.
Demonstrated robustness to task and domain shifts.
Showed improved generalization with spatially-aware features.
Abstract
Given new tasks with very little datasuch as new classes in a classification problem or a domain shift in the inputperformance of modern vision systems degrades remarkably quickly. In this work, we illustrate how the neural network representations which underpin modern vision systems are subject to supervision collapse, whereby they lose any information that is not necessary for performing the training task, including information that may be necessary for transfer to new tasks or domains. We then propose two methods to mitigate this problem. First, we employ self-supervised learning to encourage general-purpose features that transfer better. Second, we propose a novel Transformer based neural network architecture called CrossTransformers, which can take a small number of labeled images and an unlabeled query, find coarse spatial correspondence between the query and the labeled…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Remote-Sensing Image Classification
MethodsLinear Layer · CrossTransformers · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dropout · Softmax · Label Smoothing · Byte Pair Encoding · Attention Is All You Need · Dense Connections
